Training: 2022-04-27 01:45:32,689-rank_id: 0
Training: 2022-04-27 01:45:59,045-: margin_list              [1.0, 0.0, 0.4]
Training: 2022-04-27 01:45:59,046-: network                  r100
Training: 2022-04-27 01:45:59,046-: resume                   False
Training: 2022-04-27 01:45:59,046-: output                   work_dirs/wf12m_r100
Training: 2022-04-27 01:45:59,046-: embedding_size           512
Training: 2022-04-27 01:45:59,046-: sample_rate              1.0
Training: 2022-04-27 01:45:59,046-: interclass_filtering_threshold0
Training: 2022-04-27 01:45:59,046-: fp16                     True
Training: 2022-04-27 01:45:59,046-: batch_size               128
Training: 2022-04-27 01:45:59,047-: optimizer                sgd
Training: 2022-04-27 01:45:59,047-: lr                       0.1
Training: 2022-04-27 01:45:59,047-: momentum                 0.9
Training: 2022-04-27 01:45:59,047-: weight_decay             0.0005
Training: 2022-04-27 01:45:59,047-: verbose                  2000
Training: 2022-04-27 01:45:59,047-: frequent                 10
Training: 2022-04-27 01:45:59,047-: dali                     False
Training: 2022-04-27 01:45:59,047-: rec                      /train_tmp/WebFace12M
Training: 2022-04-27 01:45:59,047-: num_classes              617970
Training: 2022-04-27 01:45:59,047-: num_image                12720066
Training: 2022-04-27 01:45:59,047-: num_epoch                20
Training: 2022-04-27 01:45:59,047-: warmup_epoch             0
Training: 2022-04-27 01:45:59,047-: val_targets              []
Training: 2022-04-27 01:45:59,047-: total_batch_size         1024
Training: 2022-04-27 01:45:59,047-: warmup_step              0
Training: 2022-04-27 01:45:59,047-: total_step               248420
Training: 2022-04-27 01:46:24,525-Reducer buckets have been rebuilt in this iteration.
Training: 2022-04-27 01:46:30,651-Speed 2986.67 samples/sec   Loss 43.0786   LearningRate 0.1000   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 8192   Required: 104 hours
Training: 2022-04-27 01:46:33,957-Speed 3098.78 samples/sec   Loss 43.2348   LearningRate 0.1000   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 8192   Required: 78 hours
Training: 2022-04-27 01:46:37,403-Speed 2972.81 samples/sec   Loss 43.3577   LearningRate 0.1000   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 8192   Required: 65 hours
Training: 2022-04-27 01:46:40,698-Speed 3108.54 samples/sec   Loss 43.6135   LearningRate 0.1000   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 8192   Required: 56 hours
Training: 2022-04-27 01:46:43,997-Speed 3104.51 samples/sec   Loss 43.5974   LearningRate 0.1000   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 8192   Required: 51 hours
Training: 2022-04-27 01:46:47,337-Speed 3067.85 samples/sec   Loss 43.7322   LearningRate 0.0999   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 8192   Required: 47 hours
Training: 2022-04-27 01:46:50,765-Speed 2987.36 samples/sec   Loss 44.1046   LearningRate 0.0999   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 8192   Required: 44 hours
Training: 2022-04-27 01:46:54,137-Speed 3038.32 samples/sec   Loss 43.6538   LearningRate 0.0999   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 8192   Required: 42 hours
Training: 2022-04-27 01:46:57,424-Speed 3116.33 samples/sec   Loss 43.6409   LearningRate 0.0999   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 8192   Required: 40 hours
Training: 2022-04-27 01:47:00,749-Speed 3080.66 samples/sec   Loss 43.5258   LearningRate 0.0999   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 16384   Required: 38 hours
Training: 2022-04-27 01:47:04,023-Speed 3128.40 samples/sec   Loss 43.4639   LearningRate 0.0999   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 16384   Required: 37 hours
Training: 2022-04-27 01:47:07,313-Speed 3113.44 samples/sec   Loss 43.3916   LearningRate 0.0999   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 16384   Required: 36 hours
Training: 2022-04-27 01:47:10,679-Speed 3043.48 samples/sec   Loss 43.4332   LearningRate 0.0999   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 16384   Required: 35 hours
Training: 2022-04-27 01:47:13,977-Speed 3105.46 samples/sec   Loss 43.4014   LearningRate 0.0999   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-04-27 01:47:17,316-Speed 3068.59 samples/sec   Loss 43.3452   LearningRate 0.0999   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-04-27 01:47:20,999-Speed 2780.84 samples/sec   Loss 43.2646   LearningRate 0.0999   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-04-27 01:47:24,355-Speed 3051.92 samples/sec   Loss 43.3127   LearningRate 0.0999   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 16384   Required: 32 hours
Training: 2022-04-27 01:47:27,644-Speed 3114.35 samples/sec   Loss 43.3388   LearningRate 0.0998   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 16384   Required: 32 hours
Training: 2022-04-27 01:47:30,996-Speed 3055.68 samples/sec   Loss 43.2834   LearningRate 0.0998   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 16384   Required: 32 hours
Training: 2022-04-27 01:47:34,378-Speed 3028.81 samples/sec   Loss 43.2613   LearningRate 0.0998   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-27 01:47:37,703-Speed 3080.01 samples/sec   Loss 43.1186   LearningRate 0.0998   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-04-27 01:47:41,069-Speed 3043.19 samples/sec   Loss 43.1315   LearningRate 0.0998   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-27 01:47:44,391-Speed 3083.97 samples/sec   Loss 43.0595   LearningRate 0.0998   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-27 01:47:47,768-Speed 3032.73 samples/sec   Loss 43.0336   LearningRate 0.0998   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-27 01:47:51,079-Speed 3094.09 samples/sec   Loss 43.0296   LearningRate 0.0998   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-04-27 01:47:54,360-Speed 3121.88 samples/sec   Loss 42.9747   LearningRate 0.0998   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-27 01:47:57,656-Speed 3107.89 samples/sec   Loss 42.9624   LearningRate 0.0998   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-27 01:48:00,992-Speed 3070.60 samples/sec   Loss 42.8749   LearningRate 0.0998   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-27 01:48:04,294-Speed 3102.11 samples/sec   Loss 42.8309   LearningRate 0.0998   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-27 01:48:07,695-Speed 3012.97 samples/sec   Loss 42.8219   LearningRate 0.0998   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-04-27 01:48:11,012-Speed 3088.03 samples/sec   Loss 42.6919   LearningRate 0.0997   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-27 01:48:14,372-Speed 3048.82 samples/sec   Loss 42.5979   LearningRate 0.0997   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-27 01:48:17,728-Speed 3051.54 samples/sec   Loss 42.6776   LearningRate 0.0997   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-27 01:48:21,046-Speed 3087.55 samples/sec   Loss 42.6395   LearningRate 0.0997   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-27 01:48:24,377-Speed 3074.84 samples/sec   Loss 42.7009   LearningRate 0.0997   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-27 01:48:27,743-Speed 3042.88 samples/sec   Loss 42.5829   LearningRate 0.0997   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-04-27 01:48:31,091-Speed 3059.74 samples/sec   Loss 42.5689   LearningRate 0.0997   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-27 01:48:34,421-Speed 3076.07 samples/sec   Loss 42.4795   LearningRate 0.0997   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-27 01:48:37,769-Speed 3059.14 samples/sec   Loss 42.4920   LearningRate 0.0997   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-04-27 01:48:41,163-Speed 3017.62 samples/sec   Loss 42.4969   LearningRate 0.0997   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-27 01:48:44,526-Speed 3046.66 samples/sec   Loss 42.4195   LearningRate 0.0997   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-27 01:48:47,891-Speed 3043.67 samples/sec   Loss 42.2942   LearningRate 0.0997   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-27 01:48:51,208-Speed 3088.37 samples/sec   Loss 42.3060   LearningRate 0.0996   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-27 01:48:54,491-Speed 3120.08 samples/sec   Loss 42.2664   LearningRate 0.0996   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-27 01:48:57,826-Speed 3071.42 samples/sec   Loss 42.1994   LearningRate 0.0996   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-27 01:49:01,124-Speed 3105.27 samples/sec   Loss 42.2452   LearningRate 0.0996   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-27 01:49:04,453-Speed 3077.62 samples/sec   Loss 42.1489   LearningRate 0.0996   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-04-27 01:49:07,798-Speed 3061.76 samples/sec   Loss 42.0032   LearningRate 0.0996   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-27 01:49:11,114-Speed 3089.90 samples/sec   Loss 42.0067   LearningRate 0.0996   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-27 01:49:14,437-Speed 3081.69 samples/sec   Loss 42.0281   LearningRate 0.0996   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-27 01:49:17,735-Speed 3106.24 samples/sec   Loss 41.8687   LearningRate 0.0996   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-27 01:49:21,045-Speed 3095.01 samples/sec   Loss 41.9638   LearningRate 0.0996   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-27 01:49:24,378-Speed 3073.10 samples/sec   Loss 41.8544   LearningRate 0.0996   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-27 01:49:27,715-Speed 3069.64 samples/sec   Loss 41.8189   LearningRate 0.0996   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 262144   Required: 26 hours
Training: 2022-04-27 01:49:31,087-Speed 3037.53 samples/sec   Loss 41.7614   LearningRate 0.0995   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-27 01:49:34,433-Speed 3061.69 samples/sec   Loss 41.7508   LearningRate 0.0995   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-27 01:49:37,727-Speed 3109.42 samples/sec   Loss 41.6815   LearningRate 0.0995   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-27 01:49:41,081-Speed 3053.74 samples/sec   Loss 41.5836   LearningRate 0.0995   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-27 01:49:44,448-Speed 3042.40 samples/sec   Loss 41.6075   LearningRate 0.0995   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-27 01:49:47,778-Speed 3076.29 samples/sec   Loss 41.6037   LearningRate 0.0995   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-27 01:49:51,155-Speed 3033.38 samples/sec   Loss 41.5102   LearningRate 0.0995   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-27 01:49:54,481-Speed 3079.87 samples/sec   Loss 41.4921   LearningRate 0.0995   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-27 01:49:57,808-Speed 3079.07 samples/sec   Loss 41.5434   LearningRate 0.0995   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-27 01:50:01,164-Speed 3051.86 samples/sec   Loss 41.3123   LearningRate 0.0995   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-27 01:50:04,582-Speed 2996.47 samples/sec   Loss 41.3155   LearningRate 0.0995   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-27 01:50:07,946-Speed 3045.54 samples/sec   Loss 41.3413   LearningRate 0.0995   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-04-27 01:50:11,308-Speed 3046.67 samples/sec   Loss 41.2215   LearningRate 0.0995   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:50:14,634-Speed 3079.30 samples/sec   Loss 41.1014   LearningRate 0.0994   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:50:17,928-Speed 3109.75 samples/sec   Loss 41.1524   LearningRate 0.0994   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:50:21,212-Speed 3119.22 samples/sec   Loss 41.0278   LearningRate 0.0994   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:50:24,511-Speed 3104.50 samples/sec   Loss 41.1053   LearningRate 0.0994   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:50:27,848-Speed 3069.83 samples/sec   Loss 41.0712   LearningRate 0.0994   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:50:31,157-Speed 3096.25 samples/sec   Loss 40.9076   LearningRate 0.0994   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:50:34,462-Speed 3099.56 samples/sec   Loss 40.8406   LearningRate 0.0994   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:50:37,759-Speed 3106.54 samples/sec   Loss 40.8167   LearningRate 0.0994   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-04-27 01:50:41,054-Speed 3108.92 samples/sec   Loss 40.7383   LearningRate 0.0994   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:50:44,449-Speed 3017.09 samples/sec   Loss 40.7036   LearningRate 0.0994   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:50:47,832-Speed 3028.68 samples/sec   Loss 40.6246   LearningRate 0.0994   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:50:51,112-Speed 3122.88 samples/sec   Loss 40.6331   LearningRate 0.0994   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:50:54,479-Speed 3042.17 samples/sec   Loss 40.5333   LearningRate 0.0993   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:50:57,847-Speed 3041.70 samples/sec   Loss 40.4484   LearningRate 0.0993   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:01,215-Speed 3041.12 samples/sec   Loss 40.4011   LearningRate 0.0993   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:04,564-Speed 3058.55 samples/sec   Loss 40.4281   LearningRate 0.0993   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:07,959-Speed 3016.96 samples/sec   Loss 40.3924   LearningRate 0.0993   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:11,276-Speed 3087.89 samples/sec   Loss 40.2791   LearningRate 0.0993   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:14,563-Speed 3116.51 samples/sec   Loss 40.2284   LearningRate 0.0993   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 262144   Required: 25 hours
Training: 2022-04-27 01:51:17,924-Speed 3048.50 samples/sec   Loss 40.1248   LearningRate 0.0993   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:21,232-Speed 3096.65 samples/sec   Loss 40.1591   LearningRate 0.0993   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:24,583-Speed 3055.80 samples/sec   Loss 40.0711   LearningRate 0.0993   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:28,014-Speed 2985.76 samples/sec   Loss 39.9709   LearningRate 0.0993   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:31,382-Speed 3041.74 samples/sec   Loss 40.0381   LearningRate 0.0993   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:34,642-Speed 3141.91 samples/sec   Loss 39.9180   LearningRate 0.0993   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:37,996-Speed 3053.89 samples/sec   Loss 39.8496   LearningRate 0.0992   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:41,335-Speed 3068.10 samples/sec   Loss 39.9461   LearningRate 0.0992   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:44,693-Speed 3050.08 samples/sec   Loss 39.8462   LearningRate 0.0992   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:48,083-Speed 3021.12 samples/sec   Loss 39.6826   LearningRate 0.0992   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:51,366-Speed 3119.88 samples/sec   Loss 39.6276   LearningRate 0.0992   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:54,699-Speed 3072.71 samples/sec   Loss 39.7032   LearningRate 0.0992   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:51:58,050-Speed 3057.15 samples/sec   Loss 39.6781   LearningRate 0.0992   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:52:01,375-Speed 3080.80 samples/sec   Loss 39.5571   LearningRate 0.0992   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:52:04,784-Speed 3004.63 samples/sec   Loss 39.4922   LearningRate 0.0992   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:52:08,086-Speed 3102.48 samples/sec   Loss 39.4511   LearningRate 0.0992   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:52:11,405-Speed 3086.49 samples/sec   Loss 39.3832   LearningRate 0.0992   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:52:14,737-Speed 3073.96 samples/sec   Loss 39.3722   LearningRate 0.0992   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:52:18,029-Speed 3111.61 samples/sec   Loss 39.2739   LearningRate 0.0991   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:52:21,366-Speed 3070.13 samples/sec   Loss 39.1834   LearningRate 0.0991   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:52:24,646-Speed 3122.35 samples/sec   Loss 39.1309   LearningRate 0.0991   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-04-27 01:52:27,958-Speed 3093.77 samples/sec   Loss 39.1564   LearningRate 0.0991   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:52:31,267-Speed 3094.97 samples/sec   Loss 39.1441   LearningRate 0.0991   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:52:34,574-Speed 3097.53 samples/sec   Loss 39.0052   LearningRate 0.0991   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:52:37,922-Speed 3059.32 samples/sec   Loss 38.9490   LearningRate 0.0991   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:52:41,270-Speed 3059.40 samples/sec   Loss 38.9739   LearningRate 0.0991   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:52:44,578-Speed 3096.48 samples/sec   Loss 38.8139   LearningRate 0.0991   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:52:47,881-Speed 3101.30 samples/sec   Loss 38.8279   LearningRate 0.0991   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:52:51,200-Speed 3086.07 samples/sec   Loss 38.7623   LearningRate 0.0991   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:52:54,523-Speed 3082.32 samples/sec   Loss 38.8194   LearningRate 0.0991   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:52:57,831-Speed 3097.03 samples/sec   Loss 38.7396   LearningRate 0.0991   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-27 01:53:01,179-Speed 3059.50 samples/sec   Loss 38.7393   LearningRate 0.0990   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:04,487-Speed 3096.23 samples/sec   Loss 38.5133   LearningRate 0.0990   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:07,911-Speed 2992.18 samples/sec   Loss 38.6103   LearningRate 0.0990   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:11,244-Speed 3072.87 samples/sec   Loss 38.5089   LearningRate 0.0990   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:14,555-Speed 3094.31 samples/sec   Loss 38.4729   LearningRate 0.0990   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:17,901-Speed 3060.93 samples/sec   Loss 38.4560   LearningRate 0.0990   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:21,199-Speed 3106.24 samples/sec   Loss 38.4156   LearningRate 0.0990   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:24,499-Speed 3103.54 samples/sec   Loss 38.2790   LearningRate 0.0990   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:27,804-Speed 3099.17 samples/sec   Loss 38.2102   LearningRate 0.0990   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:31,065-Speed 3140.81 samples/sec   Loss 38.1405   LearningRate 0.0990   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:34,358-Speed 3110.81 samples/sec   Loss 38.1743   LearningRate 0.0990   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:37,742-Speed 3026.56 samples/sec   Loss 38.2015   LearningRate 0.0990   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:41,053-Speed 3094.24 samples/sec   Loss 38.1522   LearningRate 0.0989   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:44,423-Speed 3038.95 samples/sec   Loss 37.9972   LearningRate 0.0989   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:47,811-Speed 3023.14 samples/sec   Loss 38.0107   LearningRate 0.0989   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:51,123-Speed 3093.47 samples/sec   Loss 37.8591   LearningRate 0.0989   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:54,489-Speed 3042.53 samples/sec   Loss 37.9075   LearningRate 0.0989   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:53:57,884-Speed 3017.75 samples/sec   Loss 37.7580   LearningRate 0.0989   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:01,194-Speed 3094.34 samples/sec   Loss 37.7467   LearningRate 0.0989   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:04,536-Speed 3064.39 samples/sec   Loss 37.7431   LearningRate 0.0989   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:07,855-Speed 3086.48 samples/sec   Loss 37.6018   LearningRate 0.0989   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:11,157-Speed 3101.94 samples/sec   Loss 37.5251   LearningRate 0.0989   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:14,440-Speed 3119.81 samples/sec   Loss 37.6447   LearningRate 0.0989   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:17,723-Speed 3120.57 samples/sec   Loss 37.5073   LearningRate 0.0989   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:21,025-Speed 3102.25 samples/sec   Loss 37.5117   LearningRate 0.0989   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:24,323-Speed 3105.64 samples/sec   Loss 37.5293   LearningRate 0.0988   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:27,662-Speed 3067.63 samples/sec   Loss 37.3097   LearningRate 0.0988   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:30,939-Speed 3125.53 samples/sec   Loss 37.3272   LearningRate 0.0988   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:34,271-Speed 3074.39 samples/sec   Loss 37.1705   LearningRate 0.0988   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:37,624-Speed 3054.76 samples/sec   Loss 37.1658   LearningRate 0.0988   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:40,890-Speed 3135.90 samples/sec   Loss 37.1656   LearningRate 0.0988   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:44,321-Speed 2985.99 samples/sec   Loss 37.0639   LearningRate 0.0988   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:47,703-Speed 3028.36 samples/sec   Loss 37.0817   LearningRate 0.0988   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:51,028-Speed 3080.99 samples/sec   Loss 37.1419   LearningRate 0.0988   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:54,376-Speed 3059.56 samples/sec   Loss 36.8111   LearningRate 0.0988   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:54:57,697-Speed 3084.36 samples/sec   Loss 36.9212   LearningRate 0.0988   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:01,030-Speed 3072.75 samples/sec   Loss 36.8456   LearningRate 0.0988   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:04,360-Speed 3075.86 samples/sec   Loss 36.8577   LearningRate 0.0987   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:07,663-Speed 3101.84 samples/sec   Loss 36.8253   LearningRate 0.0987   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:11,031-Speed 3040.96 samples/sec   Loss 36.7466   LearningRate 0.0987   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:14,329-Speed 3105.71 samples/sec   Loss 36.5477   LearningRate 0.0987   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:17,630-Speed 3103.39 samples/sec   Loss 36.6174   LearningRate 0.0987   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:20,951-Speed 3084.40 samples/sec   Loss 36.6332   LearningRate 0.0987   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:24,221-Speed 3131.78 samples/sec   Loss 36.5143   LearningRate 0.0987   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:27,549-Speed 3078.10 samples/sec   Loss 36.3594   LearningRate 0.0987   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:30,847-Speed 3105.93 samples/sec   Loss 36.4516   LearningRate 0.0987   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:34,212-Speed 3044.21 samples/sec   Loss 36.3354   LearningRate 0.0987   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:37,485-Speed 3129.34 samples/sec   Loss 36.4039   LearningRate 0.0987   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:40,789-Speed 3100.77 samples/sec   Loss 36.3487   LearningRate 0.0987   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:44,110-Speed 3083.84 samples/sec   Loss 36.3298   LearningRate 0.0987   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:47,382-Speed 3130.63 samples/sec   Loss 36.1573   LearningRate 0.0986   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 262144   Required: 24 hours
Training: 2022-04-27 01:55:50,688-Speed 3098.31 samples/sec   Loss 36.0739   LearningRate 0.0986   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:54,023-Speed 3071.10 samples/sec   Loss 35.9750   LearningRate 0.0986   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:55:57,349-Speed 3079.89 samples/sec   Loss 36.1059   LearningRate 0.0986   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:00,686-Speed 3069.66 samples/sec   Loss 35.9792   LearningRate 0.0986   Epoch: 0   Global Step: 1730   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:04,000-Speed 3091.19 samples/sec   Loss 35.8534   LearningRate 0.0986   Epoch: 0   Global Step: 1740   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:07,286-Speed 3116.63 samples/sec   Loss 35.8342   LearningRate 0.0986   Epoch: 0   Global Step: 1750   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:10,655-Speed 3041.03 samples/sec   Loss 35.6063   LearningRate 0.0986   Epoch: 0   Global Step: 1760   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:14,037-Speed 3028.00 samples/sec   Loss 35.6791   LearningRate 0.0986   Epoch: 0   Global Step: 1770   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:17,374-Speed 3069.78 samples/sec   Loss 35.8136   LearningRate 0.0986   Epoch: 0   Global Step: 1780   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:20,658-Speed 3119.52 samples/sec   Loss 35.5772   LearningRate 0.0986   Epoch: 0   Global Step: 1790   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:23,948-Speed 3113.09 samples/sec   Loss 35.5098   LearningRate 0.0986   Epoch: 0   Global Step: 1800   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:27,247-Speed 3105.45 samples/sec   Loss 35.4811   LearningRate 0.0985   Epoch: 0   Global Step: 1810   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:30,520-Speed 3129.10 samples/sec   Loss 35.4422   LearningRate 0.0985   Epoch: 0   Global Step: 1820   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:33,812-Speed 3111.13 samples/sec   Loss 35.3456   LearningRate 0.0985   Epoch: 0   Global Step: 1830   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:37,134-Speed 3083.99 samples/sec   Loss 35.2445   LearningRate 0.0985   Epoch: 0   Global Step: 1840   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:40,412-Speed 3124.93 samples/sec   Loss 35.2724   LearningRate 0.0985   Epoch: 0   Global Step: 1850   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:43,710-Speed 3106.12 samples/sec   Loss 35.2958   LearningRate 0.0985   Epoch: 0   Global Step: 1860   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:47,102-Speed 3019.72 samples/sec   Loss 35.0837   LearningRate 0.0985   Epoch: 0   Global Step: 1870   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:50,370-Speed 3134.13 samples/sec   Loss 35.1286   LearningRate 0.0985   Epoch: 0   Global Step: 1880   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:53,727-Speed 3051.40 samples/sec   Loss 35.2271   LearningRate 0.0985   Epoch: 0   Global Step: 1890   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:56:57,030-Speed 3101.36 samples/sec   Loss 35.0702   LearningRate 0.0985   Epoch: 0   Global Step: 1900   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:00,299-Speed 3134.10 samples/sec   Loss 34.8415   LearningRate 0.0985   Epoch: 0   Global Step: 1910   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:03,603-Speed 3099.25 samples/sec   Loss 34.7339   LearningRate 0.0985   Epoch: 0   Global Step: 1920   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:06,919-Speed 3088.98 samples/sec   Loss 34.9060   LearningRate 0.0985   Epoch: 0   Global Step: 1930   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:10,195-Speed 3127.24 samples/sec   Loss 34.7186   LearningRate 0.0984   Epoch: 0   Global Step: 1940   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:13,520-Speed 3081.20 samples/sec   Loss 34.7586   LearningRate 0.0984   Epoch: 0   Global Step: 1950   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:16,836-Speed 3088.69 samples/sec   Loss 34.6686   LearningRate 0.0984   Epoch: 0   Global Step: 1960   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:20,143-Speed 3097.46 samples/sec   Loss 34.5672   LearningRate 0.0984   Epoch: 0   Global Step: 1970   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:23,469-Speed 3079.91 samples/sec   Loss 34.5693   LearningRate 0.0984   Epoch: 0   Global Step: 1980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:26,871-Speed 3010.92 samples/sec   Loss 34.5065   LearningRate 0.0984   Epoch: 0   Global Step: 1990   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:30,240-Speed 3040.45 samples/sec   Loss 34.4929   LearningRate 0.0984   Epoch: 0   Global Step: 2000   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:33,584-Speed 3063.44 samples/sec   Loss 34.3285   LearningRate 0.0984   Epoch: 0   Global Step: 2010   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:36,900-Speed 3088.98 samples/sec   Loss 34.3726   LearningRate 0.0984   Epoch: 0   Global Step: 2020   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:40,272-Speed 3037.43 samples/sec   Loss 34.3117   LearningRate 0.0984   Epoch: 0   Global Step: 2030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:43,660-Speed 3023.75 samples/sec   Loss 34.2747   LearningRate 0.0984   Epoch: 0   Global Step: 2040   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:46,995-Speed 3071.46 samples/sec   Loss 34.1439   LearningRate 0.0984   Epoch: 0   Global Step: 2050   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:50,315-Speed 3085.59 samples/sec   Loss 34.1771   LearningRate 0.0983   Epoch: 0   Global Step: 2060   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:53,615-Speed 3103.50 samples/sec   Loss 33.9595   LearningRate 0.0983   Epoch: 0   Global Step: 2070   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:57:56,988-Speed 3037.49 samples/sec   Loss 34.0752   LearningRate 0.0983   Epoch: 0   Global Step: 2080   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:00,298-Speed 3095.19 samples/sec   Loss 33.8740   LearningRate 0.0983   Epoch: 0   Global Step: 2090   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:03,664-Speed 3042.90 samples/sec   Loss 33.8566   LearningRate 0.0983   Epoch: 0   Global Step: 2100   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:07,033-Speed 3040.41 samples/sec   Loss 33.9075   LearningRate 0.0983   Epoch: 0   Global Step: 2110   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:10,380-Speed 3060.64 samples/sec   Loss 33.8332   LearningRate 0.0983   Epoch: 0   Global Step: 2120   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:13,688-Speed 3096.55 samples/sec   Loss 33.7942   LearningRate 0.0983   Epoch: 0   Global Step: 2130   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:16,997-Speed 3094.86 samples/sec   Loss 33.5240   LearningRate 0.0983   Epoch: 0   Global Step: 2140   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:20,289-Speed 3111.49 samples/sec   Loss 33.6107   LearningRate 0.0983   Epoch: 0   Global Step: 2150   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:23,547-Speed 3144.26 samples/sec   Loss 33.3887   LearningRate 0.0983   Epoch: 0   Global Step: 2160   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:26,842-Speed 3109.22 samples/sec   Loss 33.4565   LearningRate 0.0983   Epoch: 0   Global Step: 2170   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:30,193-Speed 3056.25 samples/sec   Loss 33.2438   LearningRate 0.0983   Epoch: 0   Global Step: 2180   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:33,573-Speed 3030.39 samples/sec   Loss 33.3416   LearningRate 0.0982   Epoch: 0   Global Step: 2190   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:36,864-Speed 3112.92 samples/sec   Loss 33.1728   LearningRate 0.0982   Epoch: 0   Global Step: 2200   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:40,175-Speed 3093.07 samples/sec   Loss 33.1385   LearningRate 0.0982   Epoch: 0   Global Step: 2210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:43,535-Speed 3048.97 samples/sec   Loss 33.1339   LearningRate 0.0982   Epoch: 0   Global Step: 2220   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:46,822-Speed 3115.80 samples/sec   Loss 33.1817   LearningRate 0.0982   Epoch: 0   Global Step: 2230   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:50,093-Speed 3131.42 samples/sec   Loss 32.9400   LearningRate 0.0982   Epoch: 0   Global Step: 2240   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:53,387-Speed 3109.66 samples/sec   Loss 32.9043   LearningRate 0.0982   Epoch: 0   Global Step: 2250   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:58:56,740-Speed 3055.88 samples/sec   Loss 32.9076   LearningRate 0.0982   Epoch: 0   Global Step: 2260   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:59:00,121-Speed 3029.22 samples/sec   Loss 32.8108   LearningRate 0.0982   Epoch: 0   Global Step: 2270   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:59:03,397-Speed 3126.98 samples/sec   Loss 32.8485   LearningRate 0.0982   Epoch: 0   Global Step: 2280   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:59:06,697-Speed 3104.00 samples/sec   Loss 32.7927   LearningRate 0.0982   Epoch: 0   Global Step: 2290   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:59:10,096-Speed 3013.71 samples/sec   Loss 32.7511   LearningRate 0.0982   Epoch: 0   Global Step: 2300   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:59:13,423-Speed 3078.62 samples/sec   Loss 32.8202   LearningRate 0.0981   Epoch: 0   Global Step: 2310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 01:59:16,763-Speed 3067.21 samples/sec   Loss 32.6396   LearningRate 0.0981   Epoch: 0   Global Step: 2320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 01:59:20,056-Speed 3110.99 samples/sec   Loss 32.5052   LearningRate 0.0981   Epoch: 0   Global Step: 2330   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 01:59:23,442-Speed 3025.39 samples/sec   Loss 32.7504   LearningRate 0.0981   Epoch: 0   Global Step: 2340   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 01:59:26,695-Speed 3148.81 samples/sec   Loss 32.5501   LearningRate 0.0981   Epoch: 0   Global Step: 2350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 01:59:30,003-Speed 3095.59 samples/sec   Loss 32.2903   LearningRate 0.0981   Epoch: 0   Global Step: 2360   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 01:59:33,343-Speed 3067.09 samples/sec   Loss 32.4217   LearningRate 0.0981   Epoch: 0   Global Step: 2370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 01:59:36,622-Speed 3123.53 samples/sec   Loss 32.1599   LearningRate 0.0981   Epoch: 0   Global Step: 2380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 01:59:39,991-Speed 3041.10 samples/sec   Loss 32.2417   LearningRate 0.0981   Epoch: 0   Global Step: 2390   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 01:59:43,294-Speed 3100.62 samples/sec   Loss 32.2240   LearningRate 0.0981   Epoch: 0   Global Step: 2400   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-27 01:59:46,571-Speed 3126.06 samples/sec   Loss 32.0279   LearningRate 0.0981   Epoch: 0   Global Step: 2410   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 01:59:49,899-Speed 3077.95 samples/sec   Loss 32.0832   LearningRate 0.0981   Epoch: 0   Global Step: 2420   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 01:59:53,266-Speed 3041.65 samples/sec   Loss 32.0519   LearningRate 0.0981   Epoch: 0   Global Step: 2430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 01:59:56,547-Speed 3122.00 samples/sec   Loss 31.8338   LearningRate 0.0980   Epoch: 0   Global Step: 2440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 01:59:59,836-Speed 3114.19 samples/sec   Loss 31.8570   LearningRate 0.0980   Epoch: 0   Global Step: 2450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:03,192-Speed 3053.20 samples/sec   Loss 31.9036   LearningRate 0.0980   Epoch: 0   Global Step: 2460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:06,564-Speed 3037.00 samples/sec   Loss 31.7529   LearningRate 0.0980   Epoch: 0   Global Step: 2470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:09,926-Speed 3050.32 samples/sec   Loss 31.5251   LearningRate 0.0980   Epoch: 0   Global Step: 2480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:13,218-Speed 3111.61 samples/sec   Loss 31.7313   LearningRate 0.0980   Epoch: 0   Global Step: 2490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:16,513-Speed 3109.07 samples/sec   Loss 31.3672   LearningRate 0.0980   Epoch: 0   Global Step: 2500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:19,836-Speed 3082.19 samples/sec   Loss 31.6885   LearningRate 0.0980   Epoch: 0   Global Step: 2510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:23,176-Speed 3067.46 samples/sec   Loss 31.4386   LearningRate 0.0980   Epoch: 0   Global Step: 2520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:26,498-Speed 3083.44 samples/sec   Loss 31.3825   LearningRate 0.0980   Epoch: 0   Global Step: 2530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:29,798-Speed 3103.73 samples/sec   Loss 31.5407   LearningRate 0.0980   Epoch: 0   Global Step: 2540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:33,126-Speed 3078.37 samples/sec   Loss 31.3864   LearningRate 0.0980   Epoch: 0   Global Step: 2550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:36,450-Speed 3082.35 samples/sec   Loss 31.3422   LearningRate 0.0979   Epoch: 0   Global Step: 2560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:39,820-Speed 3039.61 samples/sec   Loss 31.0621   LearningRate 0.0979   Epoch: 0   Global Step: 2570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:43,167-Speed 3060.25 samples/sec   Loss 31.1020   LearningRate 0.0979   Epoch: 0   Global Step: 2580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:46,453-Speed 3116.90 samples/sec   Loss 31.2324   LearningRate 0.0979   Epoch: 0   Global Step: 2590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:49,810-Speed 3050.66 samples/sec   Loss 30.8797   LearningRate 0.0979   Epoch: 0   Global Step: 2600   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:53,194-Speed 3027.59 samples/sec   Loss 30.9220   LearningRate 0.0979   Epoch: 0   Global Step: 2610   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:56,480-Speed 3116.55 samples/sec   Loss 30.9270   LearningRate 0.0979   Epoch: 0   Global Step: 2620   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:00:59,869-Speed 3022.49 samples/sec   Loss 30.8036   LearningRate 0.0979   Epoch: 0   Global Step: 2630   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:03,280-Speed 3003.05 samples/sec   Loss 30.6418   LearningRate 0.0979   Epoch: 0   Global Step: 2640   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:06,688-Speed 3005.32 samples/sec   Loss 30.6986   LearningRate 0.0979   Epoch: 0   Global Step: 2650   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:10,057-Speed 3040.50 samples/sec   Loss 30.6611   LearningRate 0.0979   Epoch: 0   Global Step: 2660   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:13,368-Speed 3093.94 samples/sec   Loss 30.5932   LearningRate 0.0979   Epoch: 0   Global Step: 2670   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:16,696-Speed 3077.29 samples/sec   Loss 30.4561   LearningRate 0.0979   Epoch: 0   Global Step: 2680   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:20,061-Speed 3044.15 samples/sec   Loss 30.5832   LearningRate 0.0978   Epoch: 0   Global Step: 2690   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:23,395-Speed 3072.41 samples/sec   Loss 30.5083   LearningRate 0.0978   Epoch: 0   Global Step: 2700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:26,688-Speed 3110.47 samples/sec   Loss 30.5738   LearningRate 0.0978   Epoch: 0   Global Step: 2710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:30,012-Speed 3081.57 samples/sec   Loss 30.3490   LearningRate 0.0978   Epoch: 0   Global Step: 2720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:33,357-Speed 3062.25 samples/sec   Loss 30.3122   LearningRate 0.0978   Epoch: 0   Global Step: 2730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:36,677-Speed 3085.26 samples/sec   Loss 30.2002   LearningRate 0.0978   Epoch: 0   Global Step: 2740   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:39,979-Speed 3102.41 samples/sec   Loss 30.2329   LearningRate 0.0978   Epoch: 0   Global Step: 2750   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:43,292-Speed 3091.22 samples/sec   Loss 29.9321   LearningRate 0.0978   Epoch: 0   Global Step: 2760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:46,633-Speed 3066.28 samples/sec   Loss 29.9931   LearningRate 0.0978   Epoch: 0   Global Step: 2770   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:49,988-Speed 3052.69 samples/sec   Loss 30.0118   LearningRate 0.0978   Epoch: 0   Global Step: 2780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:53,279-Speed 3112.75 samples/sec   Loss 29.8648   LearningRate 0.0978   Epoch: 0   Global Step: 2790   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:56,601-Speed 3083.82 samples/sec   Loss 29.9753   LearningRate 0.0978   Epoch: 0   Global Step: 2800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:01:59,928-Speed 3078.23 samples/sec   Loss 29.9228   LearningRate 0.0978   Epoch: 0   Global Step: 2810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:02:03,241-Speed 3091.83 samples/sec   Loss 29.7768   LearningRate 0.0977   Epoch: 0   Global Step: 2820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:02:06,547-Speed 3098.35 samples/sec   Loss 29.6698   LearningRate 0.0977   Epoch: 0   Global Step: 2830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:02:09,934-Speed 3024.39 samples/sec   Loss 29.6712   LearningRate 0.0977   Epoch: 0   Global Step: 2840   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:02:13,318-Speed 3027.35 samples/sec   Loss 29.5657   LearningRate 0.0977   Epoch: 0   Global Step: 2850   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:02:16,671-Speed 3054.35 samples/sec   Loss 29.6017   LearningRate 0.0977   Epoch: 0   Global Step: 2860   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:02:19,979-Speed 3096.53 samples/sec   Loss 29.4563   LearningRate 0.0977   Epoch: 0   Global Step: 2870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:02:23,268-Speed 3114.28 samples/sec   Loss 29.4756   LearningRate 0.0977   Epoch: 0   Global Step: 2880   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:02:26,605-Speed 3069.63 samples/sec   Loss 29.3770   LearningRate 0.0977   Epoch: 0   Global Step: 2890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:02:29,880-Speed 3127.55 samples/sec   Loss 29.4774   LearningRate 0.0977   Epoch: 0   Global Step: 2900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:02:33,165-Speed 3118.03 samples/sec   Loss 29.3943   LearningRate 0.0977   Epoch: 0   Global Step: 2910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:02:36,469-Speed 3100.26 samples/sec   Loss 29.2910   LearningRate 0.0977   Epoch: 0   Global Step: 2920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:02:39,794-Speed 3081.32 samples/sec   Loss 29.0759   LearningRate 0.0977   Epoch: 0   Global Step: 2930   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:02:43,118-Speed 3081.47 samples/sec   Loss 29.1090   LearningRate 0.0976   Epoch: 0   Global Step: 2940   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:02:46,430-Speed 3092.39 samples/sec   Loss 28.7973   LearningRate 0.0976   Epoch: 0   Global Step: 2950   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:02:49,729-Speed 3105.46 samples/sec   Loss 28.9474   LearningRate 0.0976   Epoch: 0   Global Step: 2960   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:02:53,036-Speed 3097.74 samples/sec   Loss 28.9387   LearningRate 0.0976   Epoch: 0   Global Step: 2970   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:02:56,340-Speed 3099.94 samples/sec   Loss 28.7894   LearningRate 0.0976   Epoch: 0   Global Step: 2980   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:02:59,649-Speed 3095.26 samples/sec   Loss 28.7997   LearningRate 0.0976   Epoch: 0   Global Step: 2990   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:02,978-Speed 3076.68 samples/sec   Loss 28.7475   LearningRate 0.0976   Epoch: 0   Global Step: 3000   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:06,383-Speed 3008.69 samples/sec   Loss 28.5509   LearningRate 0.0976   Epoch: 0   Global Step: 3010   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:09,741-Speed 3049.90 samples/sec   Loss 28.7862   LearningRate 0.0976   Epoch: 0   Global Step: 3020   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:13,083-Speed 3064.77 samples/sec   Loss 28.5644   LearningRate 0.0976   Epoch: 0   Global Step: 3030   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:16,420-Speed 3070.18 samples/sec   Loss 28.4914   LearningRate 0.0976   Epoch: 0   Global Step: 3040   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:19,777-Speed 3050.27 samples/sec   Loss 28.5841   LearningRate 0.0976   Epoch: 0   Global Step: 3050   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:23,044-Speed 3135.90 samples/sec   Loss 28.3424   LearningRate 0.0976   Epoch: 0   Global Step: 3060   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:26,421-Speed 3032.93 samples/sec   Loss 28.3289   LearningRate 0.0975   Epoch: 0   Global Step: 3070   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:29,674-Speed 3148.72 samples/sec   Loss 28.3370   LearningRate 0.0975   Epoch: 0   Global Step: 3080   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:33,044-Speed 3039.66 samples/sec   Loss 28.3780   LearningRate 0.0975   Epoch: 0   Global Step: 3090   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:36,324-Speed 3123.20 samples/sec   Loss 28.1094   LearningRate 0.0975   Epoch: 0   Global Step: 3100   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:39,694-Speed 3038.75 samples/sec   Loss 28.2430   LearningRate 0.0975   Epoch: 0   Global Step: 3110   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:43,045-Speed 3057.36 samples/sec   Loss 28.1172   LearningRate 0.0975   Epoch: 0   Global Step: 3120   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:46,349-Speed 3100.00 samples/sec   Loss 28.0590   LearningRate 0.0975   Epoch: 0   Global Step: 3130   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:49,719-Speed 3039.02 samples/sec   Loss 28.0624   LearningRate 0.0975   Epoch: 0   Global Step: 3140   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:53,036-Speed 3088.17 samples/sec   Loss 27.9393   LearningRate 0.0975   Epoch: 0   Global Step: 3150   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:56,365-Speed 3077.49 samples/sec   Loss 27.7438   LearningRate 0.0975   Epoch: 0   Global Step: 3160   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:03:59,686-Speed 3083.18 samples/sec   Loss 27.8596   LearningRate 0.0975   Epoch: 0   Global Step: 3170   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:02,994-Speed 3096.48 samples/sec   Loss 27.8855   LearningRate 0.0975   Epoch: 0   Global Step: 3180   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:06,286-Speed 3112.01 samples/sec   Loss 27.6791   LearningRate 0.0974   Epoch: 0   Global Step: 3190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:09,566-Speed 3123.23 samples/sec   Loss 27.6195   LearningRate 0.0974   Epoch: 0   Global Step: 3200   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:12,850-Speed 3119.64 samples/sec   Loss 27.7516   LearningRate 0.0974   Epoch: 0   Global Step: 3210   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:16,166-Speed 3089.67 samples/sec   Loss 27.4768   LearningRate 0.0974   Epoch: 0   Global Step: 3220   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:19,440-Speed 3128.75 samples/sec   Loss 27.5868   LearningRate 0.0974   Epoch: 0   Global Step: 3230   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:22,758-Speed 3086.50 samples/sec   Loss 27.4877   LearningRate 0.0974   Epoch: 0   Global Step: 3240   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-27 02:04:26,039-Speed 3122.04 samples/sec   Loss 27.2445   LearningRate 0.0974   Epoch: 0   Global Step: 3250   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:29,384-Speed 3062.35 samples/sec   Loss 27.1793   LearningRate 0.0974   Epoch: 0   Global Step: 3260   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:32,744-Speed 3048.93 samples/sec   Loss 27.3449   LearningRate 0.0974   Epoch: 0   Global Step: 3270   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:36,042-Speed 3105.52 samples/sec   Loss 27.4291   LearningRate 0.0974   Epoch: 0   Global Step: 3280   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:39,350-Speed 3096.33 samples/sec   Loss 27.3572   LearningRate 0.0974   Epoch: 0   Global Step: 3290   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:42,647-Speed 3107.27 samples/sec   Loss 27.0959   LearningRate 0.0974   Epoch: 0   Global Step: 3300   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:45,946-Speed 3105.18 samples/sec   Loss 27.2882   LearningRate 0.0974   Epoch: 0   Global Step: 3310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:49,300-Speed 3053.44 samples/sec   Loss 27.1064   LearningRate 0.0973   Epoch: 0   Global Step: 3320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:52,605-Speed 3099.93 samples/sec   Loss 26.9723   LearningRate 0.0973   Epoch: 0   Global Step: 3330   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:55,923-Speed 3087.82 samples/sec   Loss 26.8946   LearningRate 0.0973   Epoch: 0   Global Step: 3340   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:04:59,256-Speed 3072.80 samples/sec   Loss 26.9063   LearningRate 0.0973   Epoch: 0   Global Step: 3350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:02,580-Speed 3082.04 samples/sec   Loss 27.0367   LearningRate 0.0973   Epoch: 0   Global Step: 3360   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:06,011-Speed 2984.85 samples/sec   Loss 26.6745   LearningRate 0.0973   Epoch: 0   Global Step: 3370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:09,332-Speed 3084.76 samples/sec   Loss 26.5915   LearningRate 0.0973   Epoch: 0   Global Step: 3380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:12,665-Speed 3073.33 samples/sec   Loss 26.6301   LearningRate 0.0973   Epoch: 0   Global Step: 3390   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:16,024-Speed 3049.37 samples/sec   Loss 26.6925   LearningRate 0.0973   Epoch: 0   Global Step: 3400   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:19,390-Speed 3042.41 samples/sec   Loss 26.5709   LearningRate 0.0973   Epoch: 0   Global Step: 3410   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:22,685-Speed 3109.31 samples/sec   Loss 26.6163   LearningRate 0.0973   Epoch: 0   Global Step: 3420   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:26,001-Speed 3088.95 samples/sec   Loss 26.6636   LearningRate 0.0973   Epoch: 0   Global Step: 3430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:29,345-Speed 3062.69 samples/sec   Loss 26.3716   LearningRate 0.0972   Epoch: 0   Global Step: 3440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:32,666-Speed 3084.31 samples/sec   Loss 26.3276   LearningRate 0.0972   Epoch: 0   Global Step: 3450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:36,032-Speed 3043.53 samples/sec   Loss 26.4196   LearningRate 0.0972   Epoch: 0   Global Step: 3460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:39,360-Speed 3078.12 samples/sec   Loss 26.4178   LearningRate 0.0972   Epoch: 0   Global Step: 3470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:42,735-Speed 3034.78 samples/sec   Loss 26.0574   LearningRate 0.0972   Epoch: 0   Global Step: 3480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:46,057-Speed 3083.48 samples/sec   Loss 26.0779   LearningRate 0.0972   Epoch: 0   Global Step: 3490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:49,381-Speed 3081.88 samples/sec   Loss 26.4132   LearningRate 0.0972   Epoch: 0   Global Step: 3500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:52,683-Speed 3102.51 samples/sec   Loss 26.0668   LearningRate 0.0972   Epoch: 0   Global Step: 3510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:55,954-Speed 3131.05 samples/sec   Loss 25.9855   LearningRate 0.0972   Epoch: 0   Global Step: 3520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:05:59,292-Speed 3068.53 samples/sec   Loss 25.9808   LearningRate 0.0972   Epoch: 0   Global Step: 3530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:02,641-Speed 3058.68 samples/sec   Loss 26.0350   LearningRate 0.0972   Epoch: 0   Global Step: 3540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:05,969-Speed 3077.73 samples/sec   Loss 25.9078   LearningRate 0.0972   Epoch: 0   Global Step: 3550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:09,255-Speed 3117.65 samples/sec   Loss 26.0577   LearningRate 0.0972   Epoch: 0   Global Step: 3560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:12,523-Speed 3133.80 samples/sec   Loss 25.8256   LearningRate 0.0971   Epoch: 0   Global Step: 3570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:15,775-Speed 3149.86 samples/sec   Loss 25.9546   LearningRate 0.0971   Epoch: 0   Global Step: 3580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:19,120-Speed 3062.52 samples/sec   Loss 25.7256   LearningRate 0.0971   Epoch: 0   Global Step: 3590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:22,474-Speed 3054.50 samples/sec   Loss 25.6417   LearningRate 0.0971   Epoch: 0   Global Step: 3600   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:25,776-Speed 3102.08 samples/sec   Loss 25.7033   LearningRate 0.0971   Epoch: 0   Global Step: 3610   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:29,118-Speed 3065.24 samples/sec   Loss 25.5563   LearningRate 0.0971   Epoch: 0   Global Step: 3620   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:32,471-Speed 3054.32 samples/sec   Loss 25.7536   LearningRate 0.0971   Epoch: 0   Global Step: 3630   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:35,809-Speed 3068.67 samples/sec   Loss 25.6258   LearningRate 0.0971   Epoch: 0   Global Step: 3640   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:39,168-Speed 3049.47 samples/sec   Loss 25.5538   LearningRate 0.0971   Epoch: 0   Global Step: 3650   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:42,541-Speed 3037.22 samples/sec   Loss 25.2441   LearningRate 0.0971   Epoch: 0   Global Step: 3660   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:45,828-Speed 3115.52 samples/sec   Loss 25.6325   LearningRate 0.0971   Epoch: 0   Global Step: 3670   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:49,091-Speed 3139.46 samples/sec   Loss 25.4819   LearningRate 0.0971   Epoch: 0   Global Step: 3680   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:52,472-Speed 3029.93 samples/sec   Loss 25.3649   LearningRate 0.0971   Epoch: 0   Global Step: 3690   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:55,789-Speed 3088.14 samples/sec   Loss 25.1092   LearningRate 0.0970   Epoch: 0   Global Step: 3700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:06:59,084-Speed 3108.91 samples/sec   Loss 25.2284   LearningRate 0.0970   Epoch: 0   Global Step: 3710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:02,355-Speed 3131.13 samples/sec   Loss 25.4571   LearningRate 0.0970   Epoch: 0   Global Step: 3720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:05,730-Speed 3034.71 samples/sec   Loss 25.0015   LearningRate 0.0970   Epoch: 0   Global Step: 3730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:09,119-Speed 3022.61 samples/sec   Loss 25.1340   LearningRate 0.0970   Epoch: 0   Global Step: 3740   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:12,462-Speed 3065.19 samples/sec   Loss 25.2063   LearningRate 0.0970   Epoch: 0   Global Step: 3750   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:15,778-Speed 3089.06 samples/sec   Loss 24.8189   LearningRate 0.0970   Epoch: 0   Global Step: 3760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:19,109-Speed 3074.73 samples/sec   Loss 24.9523   LearningRate 0.0970   Epoch: 0   Global Step: 3770   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:22,458-Speed 3058.55 samples/sec   Loss 24.9829   LearningRate 0.0970   Epoch: 0   Global Step: 3780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:25,774-Speed 3089.22 samples/sec   Loss 25.0947   LearningRate 0.0970   Epoch: 0   Global Step: 3790   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:29,134-Speed 3048.58 samples/sec   Loss 24.9585   LearningRate 0.0970   Epoch: 0   Global Step: 3800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:32,506-Speed 3037.25 samples/sec   Loss 24.8407   LearningRate 0.0970   Epoch: 0   Global Step: 3810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:35,880-Speed 3036.13 samples/sec   Loss 24.8534   LearningRate 0.0969   Epoch: 0   Global Step: 3820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:39,245-Speed 3043.87 samples/sec   Loss 24.7817   LearningRate 0.0969   Epoch: 0   Global Step: 3830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:42,541-Speed 3107.71 samples/sec   Loss 24.7630   LearningRate 0.0969   Epoch: 0   Global Step: 3840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:45,864-Speed 3082.32 samples/sec   Loss 24.7185   LearningRate 0.0969   Epoch: 0   Global Step: 3850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:49,248-Speed 3027.04 samples/sec   Loss 24.7164   LearningRate 0.0969   Epoch: 0   Global Step: 3860   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:52,577-Speed 3076.96 samples/sec   Loss 24.7998   LearningRate 0.0969   Epoch: 0   Global Step: 3870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:55,942-Speed 3045.02 samples/sec   Loss 24.5094   LearningRate 0.0969   Epoch: 0   Global Step: 3880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:07:59,238-Speed 3108.04 samples/sec   Loss 24.5662   LearningRate 0.0969   Epoch: 0   Global Step: 3890   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:08:02,570-Speed 3073.67 samples/sec   Loss 24.3349   LearningRate 0.0969   Epoch: 0   Global Step: 3900   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:08:05,895-Speed 3080.89 samples/sec   Loss 24.4935   LearningRate 0.0969   Epoch: 0   Global Step: 3910   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:08:09,170-Speed 3127.83 samples/sec   Loss 24.6849   LearningRate 0.0969   Epoch: 0   Global Step: 3920   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:08:12,515-Speed 3061.52 samples/sec   Loss 24.5446   LearningRate 0.0969   Epoch: 0   Global Step: 3930   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:08:15,843-Speed 3078.06 samples/sec   Loss 24.4875   LearningRate 0.0969   Epoch: 0   Global Step: 3940   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:08:19,168-Speed 3081.07 samples/sec   Loss 24.3728   LearningRate 0.0968   Epoch: 0   Global Step: 3950   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-27 02:08:22,453-Speed 3117.86 samples/sec   Loss 24.4370   LearningRate 0.0968   Epoch: 0   Global Step: 3960   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:08:25,787-Speed 3072.37 samples/sec   Loss 24.2910   LearningRate 0.0968   Epoch: 0   Global Step: 3970   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:08:29,116-Speed 3077.02 samples/sec   Loss 24.2849   LearningRate 0.0968   Epoch: 0   Global Step: 3980   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:08:32,433-Speed 3088.24 samples/sec   Loss 24.2342   LearningRate 0.0968   Epoch: 0   Global Step: 3990   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:08:35,811-Speed 3032.30 samples/sec   Loss 24.2191   LearningRate 0.0968   Epoch: 0   Global Step: 4000   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:08:39,102-Speed 3111.82 samples/sec   Loss 23.9630   LearningRate 0.0968   Epoch: 0   Global Step: 4010   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:08:42,449-Speed 3060.17 samples/sec   Loss 24.0225   LearningRate 0.0968   Epoch: 0   Global Step: 4020   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:08:45,789-Speed 3066.85 samples/sec   Loss 24.1475   LearningRate 0.0968   Epoch: 0   Global Step: 4030   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:08:49,120-Speed 3074.72 samples/sec   Loss 23.9153   LearningRate 0.0968   Epoch: 0   Global Step: 4040   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:08:52,424-Speed 3101.89 samples/sec   Loss 23.9565   LearningRate 0.0968   Epoch: 0   Global Step: 4050   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:08:55,785-Speed 3047.13 samples/sec   Loss 23.8974   LearningRate 0.0968   Epoch: 0   Global Step: 4060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:08:59,146-Speed 3048.65 samples/sec   Loss 24.0669   LearningRate 0.0968   Epoch: 0   Global Step: 4070   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:09:02,499-Speed 3054.99 samples/sec   Loss 23.9203   LearningRate 0.0967   Epoch: 0   Global Step: 4080   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:09:05,832-Speed 3073.10 samples/sec   Loss 23.8108   LearningRate 0.0967   Epoch: 0   Global Step: 4090   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:09:09,129-Speed 3106.80 samples/sec   Loss 23.8865   LearningRate 0.0967   Epoch: 0   Global Step: 4100   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:09:12,457-Speed 3077.51 samples/sec   Loss 23.7157   LearningRate 0.0967   Epoch: 0   Global Step: 4110   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:09:15,729-Speed 3131.41 samples/sec   Loss 23.7699   LearningRate 0.0967   Epoch: 0   Global Step: 4120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:09:18,988-Speed 3142.32 samples/sec   Loss 23.6028   LearningRate 0.0967   Epoch: 0   Global Step: 4130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:09:22,323-Speed 3072.55 samples/sec   Loss 23.6766   LearningRate 0.0967   Epoch: 0   Global Step: 4140   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:09:25,590-Speed 3134.97 samples/sec   Loss 23.7779   LearningRate 0.0967   Epoch: 0   Global Step: 4150   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:09:28,902-Speed 3093.53 samples/sec   Loss 23.7513   LearningRate 0.0967   Epoch: 0   Global Step: 4160   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:09:32,244-Speed 3064.24 samples/sec   Loss 23.5256   LearningRate 0.0967   Epoch: 0   Global Step: 4170   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:09:35,602-Speed 3051.07 samples/sec   Loss 23.5894   LearningRate 0.0967   Epoch: 0   Global Step: 4180   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:09:38,973-Speed 3038.30 samples/sec   Loss 23.4954   LearningRate 0.0967   Epoch: 0   Global Step: 4190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:09:42,350-Speed 3033.38 samples/sec   Loss 23.3024   LearningRate 0.0966   Epoch: 0   Global Step: 4200   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:09:45,693-Speed 3063.70 samples/sec   Loss 23.4206   LearningRate 0.0966   Epoch: 0   Global Step: 4210   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:09:49,046-Speed 3055.30 samples/sec   Loss 23.2757   LearningRate 0.0966   Epoch: 0   Global Step: 4220   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:09:52,346-Speed 3103.49 samples/sec   Loss 23.4442   LearningRate 0.0966   Epoch: 0   Global Step: 4230   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:09:55,652-Speed 3098.37 samples/sec   Loss 23.2861   LearningRate 0.0966   Epoch: 0   Global Step: 4240   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-27 02:09:58,945-Speed 3110.97 samples/sec   Loss 23.2979   LearningRate 0.0966   Epoch: 0   Global Step: 4250   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:02,274-Speed 3077.20 samples/sec   Loss 23.3369   LearningRate 0.0966   Epoch: 0   Global Step: 4260   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:05,598-Speed 3081.80 samples/sec   Loss 23.3448   LearningRate 0.0966   Epoch: 0   Global Step: 4270   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:08,883-Speed 3117.36 samples/sec   Loss 23.0737   LearningRate 0.0966   Epoch: 0   Global Step: 4280   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:12,161-Speed 3125.32 samples/sec   Loss 23.1285   LearningRate 0.0966   Epoch: 0   Global Step: 4290   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:15,545-Speed 3026.45 samples/sec   Loss 23.2084   LearningRate 0.0966   Epoch: 0   Global Step: 4300   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:18,883-Speed 3068.94 samples/sec   Loss 23.0059   LearningRate 0.0966   Epoch: 0   Global Step: 4310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:22,172-Speed 3114.65 samples/sec   Loss 23.0956   LearningRate 0.0966   Epoch: 0   Global Step: 4320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:25,492-Speed 3084.65 samples/sec   Loss 23.1937   LearningRate 0.0965   Epoch: 0   Global Step: 4330   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:28,865-Speed 3036.65 samples/sec   Loss 23.1930   LearningRate 0.0965   Epoch: 0   Global Step: 4340   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:32,140-Speed 3128.22 samples/sec   Loss 23.0771   LearningRate 0.0965   Epoch: 0   Global Step: 4350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:35,474-Speed 3072.51 samples/sec   Loss 22.8319   LearningRate 0.0965   Epoch: 0   Global Step: 4360   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:38,777-Speed 3101.07 samples/sec   Loss 23.0082   LearningRate 0.0965   Epoch: 0   Global Step: 4370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:42,072-Speed 3108.93 samples/sec   Loss 22.9098   LearningRate 0.0965   Epoch: 0   Global Step: 4380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:45,382-Speed 3094.43 samples/sec   Loss 22.8651   LearningRate 0.0965   Epoch: 0   Global Step: 4390   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:48,723-Speed 3065.86 samples/sec   Loss 22.8753   LearningRate 0.0965   Epoch: 0   Global Step: 4400   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:52,116-Speed 3018.74 samples/sec   Loss 23.0938   LearningRate 0.0965   Epoch: 0   Global Step: 4410   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:55,462-Speed 3061.03 samples/sec   Loss 22.7714   LearningRate 0.0965   Epoch: 0   Global Step: 4420   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:10:58,769-Speed 3097.97 samples/sec   Loss 22.7717   LearningRate 0.0965   Epoch: 0   Global Step: 4430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:02,181-Speed 3001.75 samples/sec   Loss 22.7131   LearningRate 0.0965   Epoch: 0   Global Step: 4440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:05,462-Speed 3122.17 samples/sec   Loss 22.6061   LearningRate 0.0964   Epoch: 0   Global Step: 4450   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-27 02:11:08,714-Speed 3149.84 samples/sec   Loss 22.6175   LearningRate 0.0964   Epoch: 0   Global Step: 4460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:12,053-Speed 3067.84 samples/sec   Loss 22.6380   LearningRate 0.0964   Epoch: 0   Global Step: 4470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:15,396-Speed 3063.75 samples/sec   Loss 22.6868   LearningRate 0.0964   Epoch: 0   Global Step: 4480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:18,687-Speed 3112.91 samples/sec   Loss 22.7763   LearningRate 0.0964   Epoch: 0   Global Step: 4490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:21,936-Speed 3153.19 samples/sec   Loss 22.7025   LearningRate 0.0964   Epoch: 0   Global Step: 4500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:25,220-Speed 3118.89 samples/sec   Loss 22.6199   LearningRate 0.0964   Epoch: 0   Global Step: 4510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:28,506-Speed 3117.51 samples/sec   Loss 22.6784   LearningRate 0.0964   Epoch: 0   Global Step: 4520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:31,780-Speed 3128.73 samples/sec   Loss 22.5210   LearningRate 0.0964   Epoch: 0   Global Step: 4530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:35,051-Speed 3130.85 samples/sec   Loss 22.4915   LearningRate 0.0964   Epoch: 0   Global Step: 4540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:38,351-Speed 3104.33 samples/sec   Loss 22.4563   LearningRate 0.0964   Epoch: 0   Global Step: 4550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:41,631-Speed 3123.54 samples/sec   Loss 22.3185   LearningRate 0.0964   Epoch: 0   Global Step: 4560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:44,940-Speed 3095.73 samples/sec   Loss 22.6775   LearningRate 0.0964   Epoch: 0   Global Step: 4570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:48,272-Speed 3073.88 samples/sec   Loss 22.1786   LearningRate 0.0963   Epoch: 0   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:51,536-Speed 3138.14 samples/sec   Loss 22.5031   LearningRate 0.0963   Epoch: 0   Global Step: 4590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:54,830-Speed 3109.33 samples/sec   Loss 22.2366   LearningRate 0.0963   Epoch: 0   Global Step: 4600   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:11:58,134-Speed 3100.20 samples/sec   Loss 22.3342   LearningRate 0.0963   Epoch: 0   Global Step: 4610   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:01,455-Speed 3084.56 samples/sec   Loss 22.1707   LearningRate 0.0963   Epoch: 0   Global Step: 4620   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:04,789-Speed 3072.99 samples/sec   Loss 22.2078   LearningRate 0.0963   Epoch: 0   Global Step: 4630   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:08,074-Speed 3118.62 samples/sec   Loss 22.1972   LearningRate 0.0963   Epoch: 0   Global Step: 4640   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:11,461-Speed 3024.22 samples/sec   Loss 22.2321   LearningRate 0.0963   Epoch: 0   Global Step: 4650   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:14,784-Speed 3082.30 samples/sec   Loss 22.2108   LearningRate 0.0963   Epoch: 0   Global Step: 4660   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-27 02:12:18,099-Speed 3090.50 samples/sec   Loss 21.9838   LearningRate 0.0963   Epoch: 0   Global Step: 4670   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:21,391-Speed 3110.77 samples/sec   Loss 22.2515   LearningRate 0.0963   Epoch: 0   Global Step: 4680   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:24,735-Speed 3063.70 samples/sec   Loss 22.0754   LearningRate 0.0963   Epoch: 0   Global Step: 4690   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:28,070-Speed 3071.07 samples/sec   Loss 22.0252   LearningRate 0.0963   Epoch: 0   Global Step: 4700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:31,403-Speed 3073.06 samples/sec   Loss 21.8959   LearningRate 0.0962   Epoch: 0   Global Step: 4710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:34,713-Speed 3095.45 samples/sec   Loss 21.8883   LearningRate 0.0962   Epoch: 0   Global Step: 4720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:38,016-Speed 3100.37 samples/sec   Loss 22.0420   LearningRate 0.0962   Epoch: 0   Global Step: 4730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:41,306-Speed 3113.66 samples/sec   Loss 22.1140   LearningRate 0.0962   Epoch: 0   Global Step: 4740   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:44,618-Speed 3092.20 samples/sec   Loss 22.1479   LearningRate 0.0962   Epoch: 0   Global Step: 4750   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:47,866-Speed 3154.06 samples/sec   Loss 21.7360   LearningRate 0.0962   Epoch: 0   Global Step: 4760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:51,145-Speed 3123.70 samples/sec   Loss 21.9875   LearningRate 0.0962   Epoch: 0   Global Step: 4770   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:54,454-Speed 3095.44 samples/sec   Loss 21.8168   LearningRate 0.0962   Epoch: 0   Global Step: 4780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:12:57,832-Speed 3032.33 samples/sec   Loss 21.8797   LearningRate 0.0962   Epoch: 0   Global Step: 4790   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:01,165-Speed 3073.48 samples/sec   Loss 21.8195   LearningRate 0.0962   Epoch: 0   Global Step: 4800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:04,488-Speed 3083.27 samples/sec   Loss 21.8953   LearningRate 0.0962   Epoch: 0   Global Step: 4810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:07,796-Speed 3096.01 samples/sec   Loss 21.6565   LearningRate 0.0962   Epoch: 0   Global Step: 4820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:11,139-Speed 3063.62 samples/sec   Loss 21.8401   LearningRate 0.0961   Epoch: 0   Global Step: 4830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:14,559-Speed 2994.88 samples/sec   Loss 21.8014   LearningRate 0.0961   Epoch: 0   Global Step: 4840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:17,868-Speed 3095.77 samples/sec   Loss 21.6897   LearningRate 0.0961   Epoch: 0   Global Step: 4850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:21,270-Speed 3011.53 samples/sec   Loss 21.5503   LearningRate 0.0961   Epoch: 0   Global Step: 4860   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:24,556-Speed 3116.43 samples/sec   Loss 21.7144   LearningRate 0.0961   Epoch: 0   Global Step: 4870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:27,930-Speed 3036.89 samples/sec   Loss 21.7674   LearningRate 0.0961   Epoch: 0   Global Step: 4880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:31,282-Speed 3054.84 samples/sec   Loss 21.4156   LearningRate 0.0961   Epoch: 0   Global Step: 4890   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:34,621-Speed 3068.71 samples/sec   Loss 21.7066   LearningRate 0.0961   Epoch: 0   Global Step: 4900   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:37,986-Speed 3043.84 samples/sec   Loss 21.4675   LearningRate 0.0961   Epoch: 0   Global Step: 4910   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:41,324-Speed 3068.43 samples/sec   Loss 21.4850   LearningRate 0.0961   Epoch: 0   Global Step: 4920   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:44,639-Speed 3089.48 samples/sec   Loss 21.3853   LearningRate 0.0961   Epoch: 0   Global Step: 4930   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:48,014-Speed 3035.64 samples/sec   Loss 21.3980   LearningRate 0.0961   Epoch: 0   Global Step: 4940   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:51,301-Speed 3116.37 samples/sec   Loss 21.3667   LearningRate 0.0961   Epoch: 0   Global Step: 4950   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:54,594-Speed 3109.97 samples/sec   Loss 21.5551   LearningRate 0.0960   Epoch: 0   Global Step: 4960   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:13:57,871-Speed 3125.38 samples/sec   Loss 21.4481   LearningRate 0.0960   Epoch: 0   Global Step: 4970   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:01,154-Speed 3120.45 samples/sec   Loss 21.4850   LearningRate 0.0960   Epoch: 0   Global Step: 4980   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:04,470-Speed 3089.27 samples/sec   Loss 21.5830   LearningRate 0.0960   Epoch: 0   Global Step: 4990   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:07,758-Speed 3115.07 samples/sec   Loss 21.4911   LearningRate 0.0960   Epoch: 0   Global Step: 5000   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:11,018-Speed 3142.37 samples/sec   Loss 21.3204   LearningRate 0.0960   Epoch: 0   Global Step: 5010   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:14,267-Speed 3152.50 samples/sec   Loss 21.4283   LearningRate 0.0960   Epoch: 0   Global Step: 5020   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:17,528-Speed 3140.80 samples/sec   Loss 21.4043   LearningRate 0.0960   Epoch: 0   Global Step: 5030   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:20,842-Speed 3091.39 samples/sec   Loss 21.5063   LearningRate 0.0960   Epoch: 0   Global Step: 5040   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:24,143-Speed 3102.68 samples/sec   Loss 21.2209   LearningRate 0.0960   Epoch: 0   Global Step: 5050   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:27,447-Speed 3100.98 samples/sec   Loss 21.4731   LearningRate 0.0960   Epoch: 0   Global Step: 5060   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:30,846-Speed 3013.82 samples/sec   Loss 21.1527   LearningRate 0.0960   Epoch: 0   Global Step: 5070   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-27 02:14:34,137-Speed 3112.45 samples/sec   Loss 21.2061   LearningRate 0.0960   Epoch: 0   Global Step: 5080   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:37,439-Speed 3102.35 samples/sec   Loss 21.3664   LearningRate 0.0959   Epoch: 0   Global Step: 5090   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:40,756-Speed 3087.88 samples/sec   Loss 21.2253   LearningRate 0.0959   Epoch: 0   Global Step: 5100   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:44,102-Speed 3061.73 samples/sec   Loss 20.9697   LearningRate 0.0959   Epoch: 0   Global Step: 5110   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:47,450-Speed 3058.96 samples/sec   Loss 21.2181   LearningRate 0.0959   Epoch: 0   Global Step: 5120   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:50,709-Speed 3143.44 samples/sec   Loss 20.9689   LearningRate 0.0959   Epoch: 0   Global Step: 5130   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:54,076-Speed 3042.41 samples/sec   Loss 20.9489   LearningRate 0.0959   Epoch: 0   Global Step: 5140   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:14:57,399-Speed 3082.29 samples/sec   Loss 21.0492   LearningRate 0.0959   Epoch: 0   Global Step: 5150   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:00,746-Speed 3060.53 samples/sec   Loss 21.1690   LearningRate 0.0959   Epoch: 0   Global Step: 5160   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:04,009-Speed 3138.74 samples/sec   Loss 21.1233   LearningRate 0.0959   Epoch: 0   Global Step: 5170   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:07,285-Speed 3126.84 samples/sec   Loss 20.9748   LearningRate 0.0959   Epoch: 0   Global Step: 5180   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:10,553-Speed 3134.67 samples/sec   Loss 20.9830   LearningRate 0.0959   Epoch: 0   Global Step: 5190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:13,858-Speed 3098.85 samples/sec   Loss 20.7804   LearningRate 0.0959   Epoch: 0   Global Step: 5200   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:17,177-Speed 3086.93 samples/sec   Loss 20.8297   LearningRate 0.0958   Epoch: 0   Global Step: 5210   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:20,541-Speed 3044.37 samples/sec   Loss 20.9841   LearningRate 0.0958   Epoch: 0   Global Step: 5220   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:23,874-Speed 3073.68 samples/sec   Loss 20.9984   LearningRate 0.0958   Epoch: 0   Global Step: 5230   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:27,176-Speed 3101.99 samples/sec   Loss 21.0428   LearningRate 0.0958   Epoch: 0   Global Step: 5240   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:30,472-Speed 3107.62 samples/sec   Loss 21.0216   LearningRate 0.0958   Epoch: 0   Global Step: 5250   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:33,820-Speed 3059.26 samples/sec   Loss 21.0086   LearningRate 0.0958   Epoch: 0   Global Step: 5260   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:37,150-Speed 3075.77 samples/sec   Loss 20.9215   LearningRate 0.0958   Epoch: 0   Global Step: 5270   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:40,425-Speed 3127.98 samples/sec   Loss 20.7102   LearningRate 0.0958   Epoch: 0   Global Step: 5280   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:43,720-Speed 3108.74 samples/sec   Loss 20.7838   LearningRate 0.0958   Epoch: 0   Global Step: 5290   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:47,047-Speed 3078.70 samples/sec   Loss 20.7228   LearningRate 0.0958   Epoch: 0   Global Step: 5300   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:50,367-Speed 3085.70 samples/sec   Loss 20.9335   LearningRate 0.0958   Epoch: 0   Global Step: 5310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:53,705-Speed 3068.32 samples/sec   Loss 20.7133   LearningRate 0.0958   Epoch: 0   Global Step: 5320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:15:57,046-Speed 3065.68 samples/sec   Loss 20.5680   LearningRate 0.0958   Epoch: 0   Global Step: 5330   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:00,424-Speed 3032.56 samples/sec   Loss 20.9356   LearningRate 0.0957   Epoch: 0   Global Step: 5340   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:03,719-Speed 3109.49 samples/sec   Loss 20.8468   LearningRate 0.0957   Epoch: 0   Global Step: 5350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:07,010-Speed 3112.76 samples/sec   Loss 20.9152   LearningRate 0.0957   Epoch: 0   Global Step: 5360   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:10,381-Speed 3037.76 samples/sec   Loss 20.8011   LearningRate 0.0957   Epoch: 0   Global Step: 5370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:13,707-Speed 3080.07 samples/sec   Loss 20.8242   LearningRate 0.0957   Epoch: 0   Global Step: 5380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:17,052-Speed 3062.11 samples/sec   Loss 20.5695   LearningRate 0.0957   Epoch: 0   Global Step: 5390   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:20,393-Speed 3071.26 samples/sec   Loss 20.5892   LearningRate 0.0957   Epoch: 0   Global Step: 5400   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:23,723-Speed 3075.97 samples/sec   Loss 20.6710   LearningRate 0.0957   Epoch: 0   Global Step: 5410   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:26,976-Speed 3148.49 samples/sec   Loss 20.3867   LearningRate 0.0957   Epoch: 0   Global Step: 5420   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:30,319-Speed 3064.62 samples/sec   Loss 20.5128   LearningRate 0.0957   Epoch: 0   Global Step: 5430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:33,706-Speed 3024.37 samples/sec   Loss 20.4502   LearningRate 0.0957   Epoch: 0   Global Step: 5440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:37,057-Speed 3056.03 samples/sec   Loss 20.5432   LearningRate 0.0957   Epoch: 0   Global Step: 5450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:40,377-Speed 3085.46 samples/sec   Loss 20.4638   LearningRate 0.0957   Epoch: 0   Global Step: 5460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:43,675-Speed 3105.57 samples/sec   Loss 20.5217   LearningRate 0.0956   Epoch: 0   Global Step: 5470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:46,994-Speed 3087.44 samples/sec   Loss 20.5820   LearningRate 0.0956   Epoch: 0   Global Step: 5480   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-27 02:16:50,355-Speed 3047.87 samples/sec   Loss 20.4384   LearningRate 0.0956   Epoch: 0   Global Step: 5490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:53,733-Speed 3032.15 samples/sec   Loss 20.6463   LearningRate 0.0956   Epoch: 0   Global Step: 5500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:16:57,001-Speed 3134.49 samples/sec   Loss 20.5828   LearningRate 0.0956   Epoch: 0   Global Step: 5510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:00,348-Speed 3060.32 samples/sec   Loss 20.4949   LearningRate 0.0956   Epoch: 0   Global Step: 5520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:03,652-Speed 3100.46 samples/sec   Loss 20.3902   LearningRate 0.0956   Epoch: 0   Global Step: 5530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:06,988-Speed 3070.29 samples/sec   Loss 20.5649   LearningRate 0.0956   Epoch: 0   Global Step: 5540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:10,305-Speed 3087.81 samples/sec   Loss 20.5011   LearningRate 0.0956   Epoch: 0   Global Step: 5550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:13,571-Speed 3136.43 samples/sec   Loss 20.5212   LearningRate 0.0956   Epoch: 0   Global Step: 5560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:16,869-Speed 3106.79 samples/sec   Loss 20.4061   LearningRate 0.0956   Epoch: 0   Global Step: 5570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:20,182-Speed 3090.96 samples/sec   Loss 20.3915   LearningRate 0.0956   Epoch: 0   Global Step: 5580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:23,502-Speed 3086.67 samples/sec   Loss 20.4285   LearningRate 0.0956   Epoch: 0   Global Step: 5590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:26,858-Speed 3051.57 samples/sec   Loss 20.4319   LearningRate 0.0955   Epoch: 0   Global Step: 5600   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:30,215-Speed 3051.22 samples/sec   Loss 20.3588   LearningRate 0.0955   Epoch: 0   Global Step: 5610   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:33,568-Speed 3054.78 samples/sec   Loss 20.2047   LearningRate 0.0955   Epoch: 0   Global Step: 5620   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:36,888-Speed 3085.94 samples/sec   Loss 20.2525   LearningRate 0.0955   Epoch: 0   Global Step: 5630   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:40,186-Speed 3105.27 samples/sec   Loss 20.2722   LearningRate 0.0955   Epoch: 0   Global Step: 5640   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:43,575-Speed 3022.85 samples/sec   Loss 20.3437   LearningRate 0.0955   Epoch: 0   Global Step: 5650   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:46,850-Speed 3127.58 samples/sec   Loss 20.1233   LearningRate 0.0955   Epoch: 0   Global Step: 5660   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:50,147-Speed 3105.92 samples/sec   Loss 20.2053   LearningRate 0.0955   Epoch: 0   Global Step: 5670   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:53,406-Speed 3143.57 samples/sec   Loss 20.1774   LearningRate 0.0955   Epoch: 0   Global Step: 5680   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:17:56,681-Speed 3127.08 samples/sec   Loss 20.3364   LearningRate 0.0955   Epoch: 0   Global Step: 5690   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-27 02:17:59,996-Speed 3089.79 samples/sec   Loss 20.2994   LearningRate 0.0955   Epoch: 0   Global Step: 5700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:03,263-Speed 3135.97 samples/sec   Loss 20.1159   LearningRate 0.0955   Epoch: 0   Global Step: 5710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:06,629-Speed 3043.19 samples/sec   Loss 20.2102   LearningRate 0.0954   Epoch: 0   Global Step: 5720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:10,001-Speed 3037.94 samples/sec   Loss 19.9816   LearningRate 0.0954   Epoch: 0   Global Step: 5730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:13,324-Speed 3081.91 samples/sec   Loss 20.1696   LearningRate 0.0954   Epoch: 0   Global Step: 5740   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:16,661-Speed 3069.64 samples/sec   Loss 20.0288   LearningRate 0.0954   Epoch: 0   Global Step: 5750   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:19,985-Speed 3081.63 samples/sec   Loss 19.9735   LearningRate 0.0954   Epoch: 0   Global Step: 5760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:23,376-Speed 3020.68 samples/sec   Loss 19.9983   LearningRate 0.0954   Epoch: 0   Global Step: 5770   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:26,711-Speed 3071.06 samples/sec   Loss 20.2316   LearningRate 0.0954   Epoch: 0   Global Step: 5780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:30,056-Speed 3062.60 samples/sec   Loss 20.0249   LearningRate 0.0954   Epoch: 0   Global Step: 5790   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:33,418-Speed 3046.08 samples/sec   Loss 19.9719   LearningRate 0.0954   Epoch: 0   Global Step: 5800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:36,807-Speed 3022.69 samples/sec   Loss 20.0541   LearningRate 0.0954   Epoch: 0   Global Step: 5810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:40,112-Speed 3099.40 samples/sec   Loss 19.8148   LearningRate 0.0954   Epoch: 0   Global Step: 5820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:43,436-Speed 3082.25 samples/sec   Loss 20.0510   LearningRate 0.0954   Epoch: 0   Global Step: 5830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:46,771-Speed 3071.01 samples/sec   Loss 20.0259   LearningRate 0.0954   Epoch: 0   Global Step: 5840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:50,125-Speed 3054.58 samples/sec   Loss 20.0811   LearningRate 0.0953   Epoch: 0   Global Step: 5850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:53,413-Speed 3115.80 samples/sec   Loss 19.8388   LearningRate 0.0953   Epoch: 0   Global Step: 5860   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:18:56,736-Speed 3082.49 samples/sec   Loss 19.9428   LearningRate 0.0953   Epoch: 0   Global Step: 5870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:19:00,082-Speed 3061.38 samples/sec   Loss 20.0231   LearningRate 0.0953   Epoch: 0   Global Step: 5880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:19:03,403-Speed 3084.25 samples/sec   Loss 20.0548   LearningRate 0.0953   Epoch: 0   Global Step: 5890   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:19:06,725-Speed 3083.36 samples/sec   Loss 19.9491   LearningRate 0.0953   Epoch: 0   Global Step: 5900   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:19:10,065-Speed 3067.24 samples/sec   Loss 19.7273   LearningRate 0.0953   Epoch: 0   Global Step: 5910   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:19:13,377-Speed 3092.24 samples/sec   Loss 19.7878   LearningRate 0.0953   Epoch: 0   Global Step: 5920   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:19:16,654-Speed 3126.33 samples/sec   Loss 19.9982   LearningRate 0.0953   Epoch: 0   Global Step: 5930   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:19:20,022-Speed 3040.84 samples/sec   Loss 19.9279   LearningRate 0.0953   Epoch: 0   Global Step: 5940   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:19:23,394-Speed 3037.75 samples/sec   Loss 20.0919   LearningRate 0.0953   Epoch: 0   Global Step: 5950   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:19:26,730-Speed 3070.73 samples/sec   Loss 19.8975   LearningRate 0.0953   Epoch: 0   Global Step: 5960   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:19:30,073-Speed 3063.31 samples/sec   Loss 19.8229   LearningRate 0.0953   Epoch: 0   Global Step: 5970   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:19:33,369-Speed 3107.92 samples/sec   Loss 19.8636   LearningRate 0.0952   Epoch: 0   Global Step: 5980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:19:36,748-Speed 3031.16 samples/sec   Loss 19.6161   LearningRate 0.0952   Epoch: 0   Global Step: 5990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:19:40,053-Speed 3099.56 samples/sec   Loss 19.7748   LearningRate 0.0952   Epoch: 0   Global Step: 6000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:19:43,367-Speed 3091.30 samples/sec   Loss 19.8355   LearningRate 0.0952   Epoch: 0   Global Step: 6010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:19:46,658-Speed 3112.67 samples/sec   Loss 19.5788   LearningRate 0.0952   Epoch: 0   Global Step: 6020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:19:49,968-Speed 3094.25 samples/sec   Loss 19.7887   LearningRate 0.0952   Epoch: 0   Global Step: 6030   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:19:53,301-Speed 3073.13 samples/sec   Loss 19.6441   LearningRate 0.0952   Epoch: 0   Global Step: 6040   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:19:56,623-Speed 3083.51 samples/sec   Loss 19.5760   LearningRate 0.0952   Epoch: 0   Global Step: 6050   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:19:59,975-Speed 3055.85 samples/sec   Loss 19.7376   LearningRate 0.0952   Epoch: 0   Global Step: 6060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:20:03,313-Speed 3067.89 samples/sec   Loss 19.7200   LearningRate 0.0952   Epoch: 0   Global Step: 6070   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:20:06,598-Speed 3118.45 samples/sec   Loss 19.6947   LearningRate 0.0952   Epoch: 0   Global Step: 6080   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:09,867-Speed 3133.97 samples/sec   Loss 19.5458   LearningRate 0.0952   Epoch: 0   Global Step: 6090   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:13,127-Speed 3141.58 samples/sec   Loss 19.8680   LearningRate 0.0951   Epoch: 0   Global Step: 6100   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:16,412-Speed 3118.81 samples/sec   Loss 19.5631   LearningRate 0.0951   Epoch: 0   Global Step: 6110   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:19,717-Speed 3099.04 samples/sec   Loss 19.5974   LearningRate 0.0951   Epoch: 0   Global Step: 6120   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:23,076-Speed 3048.99 samples/sec   Loss 19.7132   LearningRate 0.0951   Epoch: 0   Global Step: 6130   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:26,392-Speed 3089.16 samples/sec   Loss 19.5789   LearningRate 0.0951   Epoch: 0   Global Step: 6140   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:29,701-Speed 3095.23 samples/sec   Loss 19.5181   LearningRate 0.0951   Epoch: 0   Global Step: 6150   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:33,085-Speed 3027.44 samples/sec   Loss 19.5558   LearningRate 0.0951   Epoch: 0   Global Step: 6160   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:36,385-Speed 3103.66 samples/sec   Loss 19.4829   LearningRate 0.0951   Epoch: 0   Global Step: 6170   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:39,716-Speed 3075.59 samples/sec   Loss 19.4522   LearningRate 0.0951   Epoch: 0   Global Step: 6180   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-27 02:20:43,005-Speed 3113.75 samples/sec   Loss 19.4170   LearningRate 0.0951   Epoch: 0   Global Step: 6190   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:46,325-Speed 3085.07 samples/sec   Loss 19.4612   LearningRate 0.0951   Epoch: 0   Global Step: 6200   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:49,621-Speed 3108.40 samples/sec   Loss 19.5873   LearningRate 0.0951   Epoch: 0   Global Step: 6210   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:52,946-Speed 3080.75 samples/sec   Loss 19.4750   LearningRate 0.0951   Epoch: 0   Global Step: 6220   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:56,237-Speed 3112.41 samples/sec   Loss 19.5312   LearningRate 0.0950   Epoch: 0   Global Step: 6230   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:20:59,561-Speed 3081.36 samples/sec   Loss 19.4880   LearningRate 0.0950   Epoch: 0   Global Step: 6240   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:02,881-Speed 3085.66 samples/sec   Loss 19.3268   LearningRate 0.0950   Epoch: 0   Global Step: 6250   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:06,259-Speed 3031.63 samples/sec   Loss 19.4976   LearningRate 0.0950   Epoch: 0   Global Step: 6260   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:09,540-Speed 3122.83 samples/sec   Loss 19.3773   LearningRate 0.0950   Epoch: 0   Global Step: 6270   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:12,897-Speed 3051.33 samples/sec   Loss 19.5430   LearningRate 0.0950   Epoch: 0   Global Step: 6280   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:16,263-Speed 3042.82 samples/sec   Loss 19.4248   LearningRate 0.0950   Epoch: 0   Global Step: 6290   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-27 02:21:19,631-Speed 3040.83 samples/sec   Loss 19.3737   LearningRate 0.0950   Epoch: 0   Global Step: 6300   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:22,993-Speed 3046.80 samples/sec   Loss 19.3043   LearningRate 0.0950   Epoch: 0   Global Step: 6310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:26,307-Speed 3091.01 samples/sec   Loss 19.2820   LearningRate 0.0950   Epoch: 0   Global Step: 6320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:29,627-Speed 3085.41 samples/sec   Loss 19.2460   LearningRate 0.0950   Epoch: 0   Global Step: 6330   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:32,948-Speed 3084.15 samples/sec   Loss 19.2436   LearningRate 0.0950   Epoch: 0   Global Step: 6340   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:36,241-Speed 3110.85 samples/sec   Loss 19.3404   LearningRate 0.0950   Epoch: 0   Global Step: 6350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:39,530-Speed 3114.70 samples/sec   Loss 19.2873   LearningRate 0.0949   Epoch: 0   Global Step: 6360   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:42,866-Speed 3069.71 samples/sec   Loss 19.1819   LearningRate 0.0949   Epoch: 0   Global Step: 6370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:46,198-Speed 3074.28 samples/sec   Loss 19.2888   LearningRate 0.0949   Epoch: 0   Global Step: 6380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:49,475-Speed 3126.17 samples/sec   Loss 19.0305   LearningRate 0.0949   Epoch: 0   Global Step: 6390   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:52,779-Speed 3100.11 samples/sec   Loss 19.1540   LearningRate 0.0949   Epoch: 0   Global Step: 6400   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:56,036-Speed 3144.80 samples/sec   Loss 19.2567   LearningRate 0.0949   Epoch: 0   Global Step: 6410   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:21:59,353-Speed 3087.58 samples/sec   Loss 19.3704   LearningRate 0.0949   Epoch: 0   Global Step: 6420   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:02,742-Speed 3023.19 samples/sec   Loss 19.2555   LearningRate 0.0949   Epoch: 0   Global Step: 6430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:06,044-Speed 3101.59 samples/sec   Loss 19.2670   LearningRate 0.0949   Epoch: 0   Global Step: 6440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:09,343-Speed 3104.95 samples/sec   Loss 19.2263   LearningRate 0.0949   Epoch: 0   Global Step: 6450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:12,620-Speed 3125.83 samples/sec   Loss 19.3695   LearningRate 0.0949   Epoch: 0   Global Step: 6460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:15,869-Speed 3152.84 samples/sec   Loss 19.2478   LearningRate 0.0949   Epoch: 0   Global Step: 6470   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:19,171-Speed 3102.25 samples/sec   Loss 19.2846   LearningRate 0.0949   Epoch: 0   Global Step: 6480   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:22,475-Speed 3099.98 samples/sec   Loss 18.9674   LearningRate 0.0948   Epoch: 0   Global Step: 6490   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:25,723-Speed 3154.72 samples/sec   Loss 19.0851   LearningRate 0.0948   Epoch: 0   Global Step: 6500   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:29,014-Speed 3112.02 samples/sec   Loss 19.2437   LearningRate 0.0948   Epoch: 0   Global Step: 6510   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:32,305-Speed 3113.22 samples/sec   Loss 19.0568   LearningRate 0.0948   Epoch: 0   Global Step: 6520   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:35,578-Speed 3129.66 samples/sec   Loss 19.1583   LearningRate 0.0948   Epoch: 0   Global Step: 6530   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:38,865-Speed 3115.81 samples/sec   Loss 19.1227   LearningRate 0.0948   Epoch: 0   Global Step: 6540   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:42,146-Speed 3121.41 samples/sec   Loss 19.2857   LearningRate 0.0948   Epoch: 0   Global Step: 6550   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:45,505-Speed 3049.57 samples/sec   Loss 19.1820   LearningRate 0.0948   Epoch: 0   Global Step: 6560   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:48,785-Speed 3123.23 samples/sec   Loss 18.9996   LearningRate 0.0948   Epoch: 0   Global Step: 6570   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:52,139-Speed 3054.06 samples/sec   Loss 19.1636   LearningRate 0.0948   Epoch: 0   Global Step: 6580   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:55,459-Speed 3085.68 samples/sec   Loss 19.1632   LearningRate 0.0948   Epoch: 0   Global Step: 6590   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:22:58,807-Speed 3058.41 samples/sec   Loss 19.0658   LearningRate 0.0948   Epoch: 0   Global Step: 6600   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-27 02:23:02,153-Speed 3061.99 samples/sec   Loss 19.0382   LearningRate 0.0947   Epoch: 0   Global Step: 6610   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:05,488-Speed 3071.30 samples/sec   Loss 19.2107   LearningRate 0.0947   Epoch: 0   Global Step: 6620   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:08,793-Speed 3099.34 samples/sec   Loss 19.2471   LearningRate 0.0947   Epoch: 0   Global Step: 6630   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:12,107-Speed 3090.06 samples/sec   Loss 18.9455   LearningRate 0.0947   Epoch: 0   Global Step: 6640   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:15,391-Speed 3119.59 samples/sec   Loss 18.9784   LearningRate 0.0947   Epoch: 0   Global Step: 6650   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:18,697-Speed 3098.57 samples/sec   Loss 19.1930   LearningRate 0.0947   Epoch: 0   Global Step: 6660   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:22,079-Speed 3028.48 samples/sec   Loss 19.1510   LearningRate 0.0947   Epoch: 0   Global Step: 6670   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:25,433-Speed 3053.94 samples/sec   Loss 19.2007   LearningRate 0.0947   Epoch: 0   Global Step: 6680   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:28,788-Speed 3053.42 samples/sec   Loss 19.1713   LearningRate 0.0947   Epoch: 0   Global Step: 6690   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:32,060-Speed 3130.30 samples/sec   Loss 19.0521   LearningRate 0.0947   Epoch: 0   Global Step: 6700   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:35,343-Speed 3119.43 samples/sec   Loss 18.8403   LearningRate 0.0947   Epoch: 0   Global Step: 6710   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:38,659-Speed 3089.66 samples/sec   Loss 18.9383   LearningRate 0.0947   Epoch: 0   Global Step: 6720   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:41,968-Speed 3094.60 samples/sec   Loss 19.0378   LearningRate 0.0947   Epoch: 0   Global Step: 6730   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:45,251-Speed 3120.17 samples/sec   Loss 18.9137   LearningRate 0.0946   Epoch: 0   Global Step: 6740   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:48,554-Speed 3101.85 samples/sec   Loss 18.9992   LearningRate 0.0946   Epoch: 0   Global Step: 6750   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:51,862-Speed 3096.42 samples/sec   Loss 19.0133   LearningRate 0.0946   Epoch: 0   Global Step: 6760   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:55,211-Speed 3058.27 samples/sec   Loss 19.0012   LearningRate 0.0946   Epoch: 0   Global Step: 6770   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:23:59,230-Speed 2548.59 samples/sec   Loss 18.7555   LearningRate 0.0946   Epoch: 0   Global Step: 6780   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:02,534-Speed 3100.49 samples/sec   Loss 18.9308   LearningRate 0.0946   Epoch: 0   Global Step: 6790   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:05,872-Speed 3068.34 samples/sec   Loss 18.8846   LearningRate 0.0946   Epoch: 0   Global Step: 6800   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:11,167-Speed 1934.03 samples/sec   Loss 19.0349   LearningRate 0.0946   Epoch: 0   Global Step: 6810   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:14,568-Speed 3012.45 samples/sec   Loss 18.9321   LearningRate 0.0946   Epoch: 0   Global Step: 6820   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:17,896-Speed 3078.36 samples/sec   Loss 18.8400   LearningRate 0.0946   Epoch: 0   Global Step: 6830   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:21,176-Speed 3122.53 samples/sec   Loss 18.8680   LearningRate 0.0946   Epoch: 0   Global Step: 6840   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:24,482-Speed 3098.77 samples/sec   Loss 18.8512   LearningRate 0.0946   Epoch: 0   Global Step: 6850   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:27,840-Speed 3050.48 samples/sec   Loss 19.1030   LearningRate 0.0946   Epoch: 0   Global Step: 6860   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:31,158-Speed 3087.41 samples/sec   Loss 19.0234   LearningRate 0.0945   Epoch: 0   Global Step: 6870   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:34,472-Speed 3090.32 samples/sec   Loss 18.7708   LearningRate 0.0945   Epoch: 0   Global Step: 6880   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:37,764-Speed 3111.58 samples/sec   Loss 18.8108   LearningRate 0.0945   Epoch: 0   Global Step: 6890   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:41,132-Speed 3041.59 samples/sec   Loss 19.0234   LearningRate 0.0945   Epoch: 0   Global Step: 6900   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:44,483-Speed 3057.71 samples/sec   Loss 18.8013   LearningRate 0.0945   Epoch: 0   Global Step: 6910   Fp16 Grad Scale: 262144   Required: 23 hours
Training: 2022-04-27 02:24:47,760-Speed 3125.70 samples/sec   Loss 18.8203   LearningRate 0.0945   Epoch: 0   Global Step: 6920   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:51,077-Speed 3088.24 samples/sec   Loss 18.8103   LearningRate 0.0945   Epoch: 0   Global Step: 6930   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:54,386-Speed 3095.18 samples/sec   Loss 18.8100   LearningRate 0.0945   Epoch: 0   Global Step: 6940   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:24:57,766-Speed 3031.30 samples/sec   Loss 18.8650   LearningRate 0.0945   Epoch: 0   Global Step: 6950   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:25:01,031-Speed 3137.04 samples/sec   Loss 18.7353   LearningRate 0.0945   Epoch: 0   Global Step: 6960   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:25:04,327-Speed 3107.21 samples/sec   Loss 18.7551   LearningRate 0.0945   Epoch: 0   Global Step: 6970   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:25:07,667-Speed 3066.57 samples/sec   Loss 18.6478   LearningRate 0.0945   Epoch: 0   Global Step: 6980   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:25:11,002-Speed 3071.94 samples/sec   Loss 18.7904   LearningRate 0.0945   Epoch: 0   Global Step: 6990   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:25:14,292-Speed 3113.98 samples/sec   Loss 18.6178   LearningRate 0.0944   Epoch: 0   Global Step: 7000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:25:17,629-Speed 3068.95 samples/sec   Loss 19.0064   LearningRate 0.0944   Epoch: 0   Global Step: 7010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:25:20,952-Speed 3082.48 samples/sec   Loss 18.7119   LearningRate 0.0944   Epoch: 0   Global Step: 7020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:25:24,279-Speed 3078.54 samples/sec   Loss 18.5394   LearningRate 0.0944   Epoch: 0   Global Step: 7030   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:25:27,561-Speed 3121.39 samples/sec   Loss 18.5314   LearningRate 0.0944   Epoch: 0   Global Step: 7040   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:25:30,979-Speed 2996.73 samples/sec   Loss 18.6692   LearningRate 0.0944   Epoch: 0   Global Step: 7050   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:25:34,313-Speed 3072.20 samples/sec   Loss 18.7178   LearningRate 0.0944   Epoch: 0   Global Step: 7060   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:25:37,564-Speed 3150.91 samples/sec   Loss 18.5977   LearningRate 0.0944   Epoch: 0   Global Step: 7070   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:25:40,833-Speed 3133.59 samples/sec   Loss 18.8118   LearningRate 0.0944   Epoch: 0   Global Step: 7080   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:25:44,129-Speed 3108.03 samples/sec   Loss 18.7536   LearningRate 0.0944   Epoch: 0   Global Step: 7090   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:25:47,459-Speed 3075.64 samples/sec   Loss 18.6429   LearningRate 0.0944   Epoch: 0   Global Step: 7100   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:25:50,795-Speed 3070.64 samples/sec   Loss 18.5319   LearningRate 0.0944   Epoch: 0   Global Step: 7110   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:25:54,134-Speed 3067.29 samples/sec   Loss 18.7066   LearningRate 0.0943   Epoch: 0   Global Step: 7120   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:25:57,484-Speed 3058.11 samples/sec   Loss 18.5186   LearningRate 0.0943   Epoch: 0   Global Step: 7130   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:26:00,815-Speed 3074.83 samples/sec   Loss 18.5296   LearningRate 0.0943   Epoch: 0   Global Step: 7140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:26:04,155-Speed 3067.09 samples/sec   Loss 18.5767   LearningRate 0.0943   Epoch: 0   Global Step: 7150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:26:07,503-Speed 3059.45 samples/sec   Loss 18.7272   LearningRate 0.0943   Epoch: 0   Global Step: 7160   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:26:10,807-Speed 3099.95 samples/sec   Loss 18.7411   LearningRate 0.0943   Epoch: 0   Global Step: 7170   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:26:14,069-Speed 3139.91 samples/sec   Loss 18.6171   LearningRate 0.0943   Epoch: 0   Global Step: 7180   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:26:17,417-Speed 3060.03 samples/sec   Loss 18.5394   LearningRate 0.0943   Epoch: 0   Global Step: 7190   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:26:20,765-Speed 3058.87 samples/sec   Loss 18.5701   LearningRate 0.0943   Epoch: 0   Global Step: 7200   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:26:24,104-Speed 3068.41 samples/sec   Loss 18.6500   LearningRate 0.0943   Epoch: 0   Global Step: 7210   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:26:27,371-Speed 3134.98 samples/sec   Loss 18.5264   LearningRate 0.0943   Epoch: 0   Global Step: 7220   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:26:30,706-Speed 3071.25 samples/sec   Loss 18.4681   LearningRate 0.0943   Epoch: 0   Global Step: 7230   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 02:26:34,084-Speed 3032.40 samples/sec   Loss 18.5810   LearningRate 0.0943   Epoch: 0   Global Step: 7240   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:26:37,413-Speed 3076.73 samples/sec   Loss 18.3736   LearningRate 0.0942   Epoch: 0   Global Step: 7250   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:26:40,724-Speed 3093.26 samples/sec   Loss 18.4527   LearningRate 0.0942   Epoch: 0   Global Step: 7260   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:26:44,072-Speed 3059.64 samples/sec   Loss 18.5230   LearningRate 0.0942   Epoch: 0   Global Step: 7270   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:26:47,486-Speed 3000.62 samples/sec   Loss 18.4635   LearningRate 0.0942   Epoch: 0   Global Step: 7280   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:26:50,869-Speed 3028.26 samples/sec   Loss 18.4608   LearningRate 0.0942   Epoch: 0   Global Step: 7290   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:26:54,193-Speed 3080.99 samples/sec   Loss 18.3017   LearningRate 0.0942   Epoch: 0   Global Step: 7300   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:26:57,539-Speed 3061.26 samples/sec   Loss 18.5489   LearningRate 0.0942   Epoch: 0   Global Step: 7310   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:00,875-Speed 3070.68 samples/sec   Loss 18.4116   LearningRate 0.0942   Epoch: 0   Global Step: 7320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:04,237-Speed 3046.90 samples/sec   Loss 18.3576   LearningRate 0.0942   Epoch: 0   Global Step: 7330   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:07,563-Speed 3079.69 samples/sec   Loss 18.4663   LearningRate 0.0942   Epoch: 0   Global Step: 7340   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:10,880-Speed 3087.88 samples/sec   Loss 18.3471   LearningRate 0.0942   Epoch: 0   Global Step: 7350   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:14,144-Speed 3138.23 samples/sec   Loss 18.5248   LearningRate 0.0942   Epoch: 0   Global Step: 7360   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:17,415-Speed 3131.78 samples/sec   Loss 18.3356   LearningRate 0.0942   Epoch: 0   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:20,688-Speed 3129.47 samples/sec   Loss 18.3940   LearningRate 0.0941   Epoch: 0   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:23,978-Speed 3112.90 samples/sec   Loss 18.3422   LearningRate 0.0941   Epoch: 0   Global Step: 7390   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:27,363-Speed 3025.93 samples/sec   Loss 18.4069   LearningRate 0.0941   Epoch: 0   Global Step: 7400   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:30,646-Speed 3120.77 samples/sec   Loss 18.3746   LearningRate 0.0941   Epoch: 0   Global Step: 7410   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:33,950-Speed 3100.29 samples/sec   Loss 18.2429   LearningRate 0.0941   Epoch: 0   Global Step: 7420   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:37,238-Speed 3114.68 samples/sec   Loss 18.3878   LearningRate 0.0941   Epoch: 0   Global Step: 7430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:40,562-Speed 3082.13 samples/sec   Loss 18.1791   LearningRate 0.0941   Epoch: 0   Global Step: 7440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:43,922-Speed 3048.37 samples/sec   Loss 18.3116   LearningRate 0.0941   Epoch: 0   Global Step: 7450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:47,253-Speed 3074.87 samples/sec   Loss 18.2192   LearningRate 0.0941   Epoch: 0   Global Step: 7460   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-04-27 02:27:50,545-Speed 3111.57 samples/sec   Loss 18.3357   LearningRate 0.0941   Epoch: 0   Global Step: 7470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:27:53,863-Speed 3086.74 samples/sec   Loss 18.5241   LearningRate 0.0941   Epoch: 0   Global Step: 7480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:27:57,239-Speed 3034.44 samples/sec   Loss 18.3231   LearningRate 0.0941   Epoch: 0   Global Step: 7490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:00,616-Speed 3033.12 samples/sec   Loss 18.2861   LearningRate 0.0941   Epoch: 0   Global Step: 7500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:03,958-Speed 3065.14 samples/sec   Loss 18.2909   LearningRate 0.0940   Epoch: 0   Global Step: 7510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:07,320-Speed 3047.22 samples/sec   Loss 18.4021   LearningRate 0.0940   Epoch: 0   Global Step: 7520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:10,662-Speed 3064.57 samples/sec   Loss 18.3555   LearningRate 0.0940   Epoch: 0   Global Step: 7530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:13,974-Speed 3092.71 samples/sec   Loss 18.3429   LearningRate 0.0940   Epoch: 0   Global Step: 7540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:17,267-Speed 3110.69 samples/sec   Loss 18.4042   LearningRate 0.0940   Epoch: 0   Global Step: 7550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:20,641-Speed 3036.18 samples/sec   Loss 18.2926   LearningRate 0.0940   Epoch: 0   Global Step: 7560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:23,927-Speed 3117.25 samples/sec   Loss 18.3246   LearningRate 0.0940   Epoch: 0   Global Step: 7570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:27,243-Speed 3088.86 samples/sec   Loss 18.3415   LearningRate 0.0940   Epoch: 0   Global Step: 7580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:30,600-Speed 3051.73 samples/sec   Loss 18.5134   LearningRate 0.0940   Epoch: 0   Global Step: 7590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:33,975-Speed 3034.70 samples/sec   Loss 18.4641   LearningRate 0.0940   Epoch: 0   Global Step: 7600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:37,308-Speed 3073.66 samples/sec   Loss 18.3261   LearningRate 0.0940   Epoch: 0   Global Step: 7610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:40,616-Speed 3096.66 samples/sec   Loss 18.3400   LearningRate 0.0940   Epoch: 0   Global Step: 7620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:43,983-Speed 3041.98 samples/sec   Loss 18.2671   LearningRate 0.0940   Epoch: 0   Global Step: 7630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:47,279-Speed 3108.17 samples/sec   Loss 18.1149   LearningRate 0.0939   Epoch: 0   Global Step: 7640   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 02:28:50,588-Speed 3095.56 samples/sec   Loss 18.2471   LearningRate 0.0939   Epoch: 0   Global Step: 7650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:53,917-Speed 3077.36 samples/sec   Loss 18.2229   LearningRate 0.0939   Epoch: 0   Global Step: 7660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:28:57,202-Speed 3117.24 samples/sec   Loss 18.2652   LearningRate 0.0939   Epoch: 0   Global Step: 7670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:29:00,508-Speed 3098.97 samples/sec   Loss 18.4120   LearningRate 0.0939   Epoch: 0   Global Step: 7680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:29:03,808-Speed 3103.32 samples/sec   Loss 18.2114   LearningRate 0.0939   Epoch: 0   Global Step: 7690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:29:07,094-Speed 3117.17 samples/sec   Loss 18.2054   LearningRate 0.0939   Epoch: 0   Global Step: 7700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:29:10,358-Speed 3138.96 samples/sec   Loss 18.2410   LearningRate 0.0939   Epoch: 0   Global Step: 7710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:29:13,696-Speed 3068.52 samples/sec   Loss 18.2278   LearningRate 0.0939   Epoch: 0   Global Step: 7720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:29:17,021-Speed 3080.57 samples/sec   Loss 18.3821   LearningRate 0.0939   Epoch: 0   Global Step: 7730   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:29:20,327-Speed 3098.66 samples/sec   Loss 18.2573   LearningRate 0.0939   Epoch: 0   Global Step: 7740   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:29:23,653-Speed 3079.34 samples/sec   Loss 18.3979   LearningRate 0.0939   Epoch: 0   Global Step: 7750   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:29:27,019-Speed 3043.73 samples/sec   Loss 18.2032   LearningRate 0.0939   Epoch: 0   Global Step: 7760   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:29:30,312-Speed 3110.36 samples/sec   Loss 18.1230   LearningRate 0.0938   Epoch: 0   Global Step: 7770   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:29:33,657-Speed 3061.93 samples/sec   Loss 18.1221   LearningRate 0.0938   Epoch: 0   Global Step: 7780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:29:36,936-Speed 3123.90 samples/sec   Loss 18.1456   LearningRate 0.0938   Epoch: 0   Global Step: 7790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:29:40,258-Speed 3083.18 samples/sec   Loss 18.1196   LearningRate 0.0938   Epoch: 0   Global Step: 7800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:29:43,575-Speed 3087.56 samples/sec   Loss 18.0998   LearningRate 0.0938   Epoch: 0   Global Step: 7810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:29:46,906-Speed 3075.62 samples/sec   Loss 18.2451   LearningRate 0.0938   Epoch: 0   Global Step: 7820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:29:50,284-Speed 3032.53 samples/sec   Loss 18.1917   LearningRate 0.0938   Epoch: 0   Global Step: 7830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:29:53,602-Speed 3086.67 samples/sec   Loss 18.1072   LearningRate 0.0938   Epoch: 0   Global Step: 7840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:29:56,922-Speed 3085.54 samples/sec   Loss 18.0981   LearningRate 0.0938   Epoch: 0   Global Step: 7850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:00,255-Speed 3073.25 samples/sec   Loss 18.0270   LearningRate 0.0938   Epoch: 0   Global Step: 7860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:03,585-Speed 3076.28 samples/sec   Loss 18.1079   LearningRate 0.0938   Epoch: 0   Global Step: 7870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:06,943-Speed 3049.32 samples/sec   Loss 18.0615   LearningRate 0.0938   Epoch: 0   Global Step: 7880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:10,255-Speed 3093.79 samples/sec   Loss 18.2506   LearningRate 0.0937   Epoch: 0   Global Step: 7890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:13,600-Speed 3061.65 samples/sec   Loss 18.0253   LearningRate 0.0937   Epoch: 0   Global Step: 7900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:16,885-Speed 3119.11 samples/sec   Loss 18.1659   LearningRate 0.0937   Epoch: 0   Global Step: 7910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:20,158-Speed 3129.68 samples/sec   Loss 18.1782   LearningRate 0.0937   Epoch: 0   Global Step: 7920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:23,596-Speed 2979.79 samples/sec   Loss 18.0803   LearningRate 0.0937   Epoch: 0   Global Step: 7930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:26,913-Speed 3087.91 samples/sec   Loss 18.0765   LearningRate 0.0937   Epoch: 0   Global Step: 7940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:30,337-Speed 2991.56 samples/sec   Loss 17.9781   LearningRate 0.0937   Epoch: 0   Global Step: 7950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:33,634-Speed 3107.05 samples/sec   Loss 18.0380   LearningRate 0.0937   Epoch: 0   Global Step: 7960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:37,011-Speed 3033.19 samples/sec   Loss 17.9751   LearningRate 0.0937   Epoch: 0   Global Step: 7970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:40,406-Speed 3016.96 samples/sec   Loss 17.8670   LearningRate 0.0937   Epoch: 0   Global Step: 7980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:43,673-Speed 3135.71 samples/sec   Loss 18.1273   LearningRate 0.0937   Epoch: 0   Global Step: 7990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:46,941-Speed 3133.81 samples/sec   Loss 17.9458   LearningRate 0.0937   Epoch: 0   Global Step: 8000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:50,296-Speed 3053.49 samples/sec   Loss 17.9314   LearningRate 0.0937   Epoch: 0   Global Step: 8010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:53,621-Speed 3080.09 samples/sec   Loss 17.8991   LearningRate 0.0936   Epoch: 0   Global Step: 8020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:30:56,953-Speed 3074.00 samples/sec   Loss 17.9511   LearningRate 0.0936   Epoch: 0   Global Step: 8030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:00,295-Speed 3065.11 samples/sec   Loss 17.7489   LearningRate 0.0936   Epoch: 0   Global Step: 8040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:03,629-Speed 3073.02 samples/sec   Loss 17.9536   LearningRate 0.0936   Epoch: 0   Global Step: 8050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:06,906-Speed 3125.03 samples/sec   Loss 18.1116   LearningRate 0.0936   Epoch: 0   Global Step: 8060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:10,236-Speed 3076.63 samples/sec   Loss 17.8576   LearningRate 0.0936   Epoch: 0   Global Step: 8070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:13,566-Speed 3075.37 samples/sec   Loss 17.9977   LearningRate 0.0936   Epoch: 0   Global Step: 8080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:17,002-Speed 2984.55 samples/sec   Loss 17.9348   LearningRate 0.0936   Epoch: 0   Global Step: 8090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:20,349-Speed 3060.10 samples/sec   Loss 17.9370   LearningRate 0.0936   Epoch: 0   Global Step: 8100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:23,701-Speed 3055.68 samples/sec   Loss 17.7131   LearningRate 0.0936   Epoch: 0   Global Step: 8110   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 02:31:27,019-Speed 3087.59 samples/sec   Loss 17.8274   LearningRate 0.0936   Epoch: 0   Global Step: 8120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:30,304-Speed 3118.29 samples/sec   Loss 18.0532   LearningRate 0.0936   Epoch: 0   Global Step: 8130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:33,646-Speed 3064.52 samples/sec   Loss 17.9400   LearningRate 0.0936   Epoch: 0   Global Step: 8140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:37,003-Speed 3051.32 samples/sec   Loss 17.8820   LearningRate 0.0935   Epoch: 0   Global Step: 8150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:40,379-Speed 3034.43 samples/sec   Loss 17.7226   LearningRate 0.0935   Epoch: 0   Global Step: 8160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:43,718-Speed 3067.33 samples/sec   Loss 17.8105   LearningRate 0.0935   Epoch: 0   Global Step: 8170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:47,026-Speed 3096.86 samples/sec   Loss 17.6787   LearningRate 0.0935   Epoch: 0   Global Step: 8180   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:50,342-Speed 3088.90 samples/sec   Loss 17.8503   LearningRate 0.0935   Epoch: 0   Global Step: 8190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:53,671-Speed 3076.16 samples/sec   Loss 17.8931   LearningRate 0.0935   Epoch: 0   Global Step: 8200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:31:57,001-Speed 3076.37 samples/sec   Loss 17.9700   LearningRate 0.0935   Epoch: 0   Global Step: 8210   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:32:00,295-Speed 3110.05 samples/sec   Loss 17.9094   LearningRate 0.0935   Epoch: 0   Global Step: 8220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:32:03,616-Speed 3083.63 samples/sec   Loss 17.8457   LearningRate 0.0935   Epoch: 0   Global Step: 8230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:32:06,943-Speed 3079.00 samples/sec   Loss 17.6947   LearningRate 0.0935   Epoch: 0   Global Step: 8240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:32:10,318-Speed 3035.06 samples/sec   Loss 17.7821   LearningRate 0.0935   Epoch: 0   Global Step: 8250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:32:13,608-Speed 3113.33 samples/sec   Loss 17.8891   LearningRate 0.0935   Epoch: 0   Global Step: 8260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:32:16,975-Speed 3042.73 samples/sec   Loss 18.0742   LearningRate 0.0935   Epoch: 0   Global Step: 8270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:32:20,332-Speed 3051.40 samples/sec   Loss 17.7631   LearningRate 0.0934   Epoch: 0   Global Step: 8280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:32:23,640-Speed 3095.46 samples/sec   Loss 17.7534   LearningRate 0.0934   Epoch: 0   Global Step: 8290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:32:26,924-Speed 3119.94 samples/sec   Loss 17.9339   LearningRate 0.0934   Epoch: 0   Global Step: 8300   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:32:30,182-Speed 3143.90 samples/sec   Loss 17.8871   LearningRate 0.0934   Epoch: 0   Global Step: 8310   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:32:33,468-Speed 3116.40 samples/sec   Loss 17.7501   LearningRate 0.0934   Epoch: 0   Global Step: 8320   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:32:36,799-Speed 3075.66 samples/sec   Loss 17.5689   LearningRate 0.0934   Epoch: 0   Global Step: 8330   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:32:40,096-Speed 3106.42 samples/sec   Loss 17.7876   LearningRate 0.0934   Epoch: 0   Global Step: 8340   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:32:43,399-Speed 3101.47 samples/sec   Loss 17.8739   LearningRate 0.0934   Epoch: 0   Global Step: 8350   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:32:46,764-Speed 3043.39 samples/sec   Loss 17.8030   LearningRate 0.0934   Epoch: 0   Global Step: 8360   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:32:50,074-Speed 3094.63 samples/sec   Loss 17.7812   LearningRate 0.0934   Epoch: 0   Global Step: 8370   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:32:53,372-Speed 3105.89 samples/sec   Loss 17.7957   LearningRate 0.0934   Epoch: 0   Global Step: 8380   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:32:56,769-Speed 3015.43 samples/sec   Loss 18.0063   LearningRate 0.0934   Epoch: 0   Global Step: 8390   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:33:00,087-Speed 3086.74 samples/sec   Loss 17.8203   LearningRate 0.0934   Epoch: 0   Global Step: 8400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:03,376-Speed 3115.26 samples/sec   Loss 17.8233   LearningRate 0.0933   Epoch: 0   Global Step: 8410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:06,711-Speed 3070.71 samples/sec   Loss 17.9071   LearningRate 0.0933   Epoch: 0   Global Step: 8420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:10,071-Speed 3048.80 samples/sec   Loss 17.6880   LearningRate 0.0933   Epoch: 0   Global Step: 8430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:13,440-Speed 3040.21 samples/sec   Loss 17.7228   LearningRate 0.0933   Epoch: 0   Global Step: 8440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:16,770-Speed 3075.62 samples/sec   Loss 17.7492   LearningRate 0.0933   Epoch: 0   Global Step: 8450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:20,029-Speed 3142.79 samples/sec   Loss 17.6631   LearningRate 0.0933   Epoch: 0   Global Step: 8460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:23,388-Speed 3049.78 samples/sec   Loss 17.6342   LearningRate 0.0933   Epoch: 0   Global Step: 8470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:26,758-Speed 3042.26 samples/sec   Loss 17.7281   LearningRate 0.0933   Epoch: 0   Global Step: 8480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:30,102-Speed 3063.07 samples/sec   Loss 17.5854   LearningRate 0.0933   Epoch: 0   Global Step: 8490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:33,434-Speed 3074.62 samples/sec   Loss 17.7721   LearningRate 0.0933   Epoch: 0   Global Step: 8500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:33:36,759-Speed 3079.93 samples/sec   Loss 17.8680   LearningRate 0.0933   Epoch: 0   Global Step: 8510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:40,056-Speed 3107.64 samples/sec   Loss 17.6757   LearningRate 0.0933   Epoch: 0   Global Step: 8520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:43,384-Speed 3078.31 samples/sec   Loss 17.8306   LearningRate 0.0933   Epoch: 0   Global Step: 8530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:46,693-Speed 3095.47 samples/sec   Loss 17.8835   LearningRate 0.0932   Epoch: 0   Global Step: 8540   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:49,964-Speed 3130.73 samples/sec   Loss 17.6950   LearningRate 0.0932   Epoch: 0   Global Step: 8550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:53,290-Speed 3079.98 samples/sec   Loss 17.8467   LearningRate 0.0932   Epoch: 0   Global Step: 8560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:56,573-Speed 3122.22 samples/sec   Loss 17.6042   LearningRate 0.0932   Epoch: 0   Global Step: 8570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:33:59,855-Speed 3121.23 samples/sec   Loss 17.6514   LearningRate 0.0932   Epoch: 0   Global Step: 8580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:34:03,151-Speed 3107.92 samples/sec   Loss 17.8674   LearningRate 0.0932   Epoch: 0   Global Step: 8590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:34:06,482-Speed 3075.06 samples/sec   Loss 17.6041   LearningRate 0.0932   Epoch: 0   Global Step: 8600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:34:09,791-Speed 3095.17 samples/sec   Loss 17.6631   LearningRate 0.0932   Epoch: 0   Global Step: 8610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:13,122-Speed 3075.11 samples/sec   Loss 17.6040   LearningRate 0.0932   Epoch: 0   Global Step: 8620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:16,407-Speed 3118.31 samples/sec   Loss 17.5257   LearningRate 0.0932   Epoch: 0   Global Step: 8630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:19,733-Speed 3079.27 samples/sec   Loss 17.7188   LearningRate 0.0932   Epoch: 0   Global Step: 8640   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:23,024-Speed 3112.60 samples/sec   Loss 17.5425   LearningRate 0.0932   Epoch: 0   Global Step: 8650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:26,304-Speed 3122.82 samples/sec   Loss 17.6552   LearningRate 0.0931   Epoch: 0   Global Step: 8660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:29,582-Speed 3125.08 samples/sec   Loss 17.5445   LearningRate 0.0931   Epoch: 0   Global Step: 8670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:32,944-Speed 3046.28 samples/sec   Loss 17.6634   LearningRate 0.0931   Epoch: 0   Global Step: 8680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:36,306-Speed 3047.13 samples/sec   Loss 17.7452   LearningRate 0.0931   Epoch: 0   Global Step: 8690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:39,635-Speed 3076.83 samples/sec   Loss 17.5910   LearningRate 0.0931   Epoch: 0   Global Step: 8700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:42,948-Speed 3092.04 samples/sec   Loss 17.5204   LearningRate 0.0931   Epoch: 0   Global Step: 8710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:46,334-Speed 3025.21 samples/sec   Loss 17.5473   LearningRate 0.0931   Epoch: 0   Global Step: 8720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:49,721-Speed 3024.29 samples/sec   Loss 17.5160   LearningRate 0.0931   Epoch: 0   Global Step: 8730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:53,054-Speed 3073.20 samples/sec   Loss 17.5651   LearningRate 0.0931   Epoch: 0   Global Step: 8740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:56,424-Speed 3039.09 samples/sec   Loss 17.6578   LearningRate 0.0931   Epoch: 0   Global Step: 8750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:34:59,793-Speed 3040.44 samples/sec   Loss 17.5162   LearningRate 0.0931   Epoch: 0   Global Step: 8760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:03,110-Speed 3088.16 samples/sec   Loss 17.5835   LearningRate 0.0931   Epoch: 0   Global Step: 8770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:06,481-Speed 3038.43 samples/sec   Loss 17.3657   LearningRate 0.0931   Epoch: 0   Global Step: 8780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:09,794-Speed 3092.38 samples/sec   Loss 17.4742   LearningRate 0.0930   Epoch: 0   Global Step: 8790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:13,142-Speed 3058.91 samples/sec   Loss 17.4236   LearningRate 0.0930   Epoch: 0   Global Step: 8800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:16,410-Speed 3134.73 samples/sec   Loss 17.4190   LearningRate 0.0930   Epoch: 0   Global Step: 8810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:19,738-Speed 3077.69 samples/sec   Loss 17.5548   LearningRate 0.0930   Epoch: 0   Global Step: 8820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:23,018-Speed 3122.45 samples/sec   Loss 17.6469   LearningRate 0.0930   Epoch: 0   Global Step: 8830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:26,368-Speed 3058.27 samples/sec   Loss 17.4753   LearningRate 0.0930   Epoch: 0   Global Step: 8840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:29,689-Speed 3083.86 samples/sec   Loss 17.5945   LearningRate 0.0930   Epoch: 0   Global Step: 8850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:33,026-Speed 3069.59 samples/sec   Loss 17.5903   LearningRate 0.0930   Epoch: 0   Global Step: 8860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:36,438-Speed 3002.41 samples/sec   Loss 17.5442   LearningRate 0.0930   Epoch: 0   Global Step: 8870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:39,794-Speed 3051.93 samples/sec   Loss 17.3629   LearningRate 0.0930   Epoch: 0   Global Step: 8880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:43,166-Speed 3037.02 samples/sec   Loss 17.3853   LearningRate 0.0930   Epoch: 0   Global Step: 8890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:46,539-Speed 3037.57 samples/sec   Loss 17.6210   LearningRate 0.0930   Epoch: 0   Global Step: 8900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:49,798-Speed 3142.95 samples/sec   Loss 17.4884   LearningRate 0.0930   Epoch: 0   Global Step: 8910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:53,141-Speed 3064.25 samples/sec   Loss 17.5149   LearningRate 0.0929   Epoch: 0   Global Step: 8920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:56,429-Speed 3115.36 samples/sec   Loss 17.4718   LearningRate 0.0929   Epoch: 0   Global Step: 8930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:35:59,701-Speed 3130.54 samples/sec   Loss 17.4764   LearningRate 0.0929   Epoch: 0   Global Step: 8940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:03,060-Speed 3049.25 samples/sec   Loss 17.4505   LearningRate 0.0929   Epoch: 0   Global Step: 8950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:06,369-Speed 3095.62 samples/sec   Loss 17.3235   LearningRate 0.0929   Epoch: 0   Global Step: 8960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:09,767-Speed 3014.86 samples/sec   Loss 17.3374   LearningRate 0.0929   Epoch: 0   Global Step: 8970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:13,092-Speed 3080.29 samples/sec   Loss 17.5221   LearningRate 0.0929   Epoch: 0   Global Step: 8980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:16,406-Speed 3092.40 samples/sec   Loss 17.5774   LearningRate 0.0929   Epoch: 0   Global Step: 8990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:19,717-Speed 3093.88 samples/sec   Loss 17.4273   LearningRate 0.0929   Epoch: 0   Global Step: 9000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:22,996-Speed 3123.71 samples/sec   Loss 17.5751   LearningRate 0.0929   Epoch: 0   Global Step: 9010   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 02:36:26,280-Speed 3119.12 samples/sec   Loss 17.5250   LearningRate 0.0929   Epoch: 0   Global Step: 9020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:29,684-Speed 3009.03 samples/sec   Loss 17.3668   LearningRate 0.0929   Epoch: 0   Global Step: 9030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:32,992-Speed 3096.45 samples/sec   Loss 17.4046   LearningRate 0.0929   Epoch: 0   Global Step: 9040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:36,250-Speed 3143.62 samples/sec   Loss 17.2880   LearningRate 0.0928   Epoch: 0   Global Step: 9050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:39,534-Speed 3119.38 samples/sec   Loss 17.1728   LearningRate 0.0928   Epoch: 0   Global Step: 9060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:42,882-Speed 3059.00 samples/sec   Loss 17.2746   LearningRate 0.0928   Epoch: 0   Global Step: 9070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:46,196-Speed 3091.40 samples/sec   Loss 17.3934   LearningRate 0.0928   Epoch: 0   Global Step: 9080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:49,592-Speed 3016.44 samples/sec   Loss 17.4262   LearningRate 0.0928   Epoch: 0   Global Step: 9090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:52,936-Speed 3062.67 samples/sec   Loss 17.3712   LearningRate 0.0928   Epoch: 0   Global Step: 9100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:56,293-Speed 3051.04 samples/sec   Loss 17.5027   LearningRate 0.0928   Epoch: 0   Global Step: 9110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:36:59,690-Speed 3015.13 samples/sec   Loss 17.3492   LearningRate 0.0928   Epoch: 0   Global Step: 9120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:03,046-Speed 3052.74 samples/sec   Loss 17.4759   LearningRate 0.0928   Epoch: 0   Global Step: 9130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:06,367-Speed 3084.13 samples/sec   Loss 17.3353   LearningRate 0.0928   Epoch: 0   Global Step: 9140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:09,668-Speed 3102.50 samples/sec   Loss 17.3069   LearningRate 0.0928   Epoch: 0   Global Step: 9150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:12,953-Speed 3118.78 samples/sec   Loss 17.1643   LearningRate 0.0928   Epoch: 0   Global Step: 9160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:16,312-Speed 3049.51 samples/sec   Loss 17.3288   LearningRate 0.0928   Epoch: 0   Global Step: 9170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:19,596-Speed 3118.52 samples/sec   Loss 17.2242   LearningRate 0.0927   Epoch: 0   Global Step: 9180   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:22,949-Speed 3055.27 samples/sec   Loss 17.3271   LearningRate 0.0927   Epoch: 0   Global Step: 9190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:26,296-Speed 3060.52 samples/sec   Loss 17.4150   LearningRate 0.0927   Epoch: 0   Global Step: 9200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:29,574-Speed 3124.35 samples/sec   Loss 17.2778   LearningRate 0.0927   Epoch: 0   Global Step: 9210   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:32,888-Speed 3090.63 samples/sec   Loss 17.3240   LearningRate 0.0927   Epoch: 0   Global Step: 9220   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 02:37:36,194-Speed 3098.38 samples/sec   Loss 17.4122   LearningRate 0.0927   Epoch: 0   Global Step: 9230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:39,491-Speed 3107.42 samples/sec   Loss 17.2904   LearningRate 0.0927   Epoch: 0   Global Step: 9240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:42,836-Speed 3061.90 samples/sec   Loss 17.3413   LearningRate 0.0927   Epoch: 0   Global Step: 9250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:46,143-Speed 3097.58 samples/sec   Loss 17.0539   LearningRate 0.0927   Epoch: 0   Global Step: 9260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:49,513-Speed 3039.60 samples/sec   Loss 17.2606   LearningRate 0.0927   Epoch: 0   Global Step: 9270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:52,851-Speed 3068.10 samples/sec   Loss 17.2243   LearningRate 0.0927   Epoch: 0   Global Step: 9280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:56,141-Speed 3113.94 samples/sec   Loss 17.4753   LearningRate 0.0927   Epoch: 0   Global Step: 9290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:37:59,528-Speed 3024.09 samples/sec   Loss 17.1644   LearningRate 0.0927   Epoch: 0   Global Step: 9300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:38:02,860-Speed 3074.57 samples/sec   Loss 17.2562   LearningRate 0.0926   Epoch: 0   Global Step: 9310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:38:06,152-Speed 3111.44 samples/sec   Loss 17.2321   LearningRate 0.0926   Epoch: 0   Global Step: 9320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:38:09,431-Speed 3123.67 samples/sec   Loss 17.2158   LearningRate 0.0926   Epoch: 0   Global Step: 9330   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 02:38:12,740-Speed 3095.75 samples/sec   Loss 17.2220   LearningRate 0.0926   Epoch: 0   Global Step: 9340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:38:16,046-Speed 3099.27 samples/sec   Loss 17.3067   LearningRate 0.0926   Epoch: 0   Global Step: 9350   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:38:19,404-Speed 3050.54 samples/sec   Loss 17.1622   LearningRate 0.0926   Epoch: 0   Global Step: 9360   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:38:22,709-Speed 3099.30 samples/sec   Loss 17.3922   LearningRate 0.0926   Epoch: 0   Global Step: 9370   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:38:26,022-Speed 3091.36 samples/sec   Loss 17.2865   LearningRate 0.0926   Epoch: 0   Global Step: 9380   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:38:29,404-Speed 3028.92 samples/sec   Loss 17.2986   LearningRate 0.0926   Epoch: 0   Global Step: 9390   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:38:32,721-Speed 3088.12 samples/sec   Loss 17.2783   LearningRate 0.0926   Epoch: 0   Global Step: 9400   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:38:36,148-Speed 2988.77 samples/sec   Loss 17.2584   LearningRate 0.0926   Epoch: 0   Global Step: 9410   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:38:39,523-Speed 3035.27 samples/sec   Loss 17.2808   LearningRate 0.0926   Epoch: 0   Global Step: 9420   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:38:42,816-Speed 3110.65 samples/sec   Loss 17.3021   LearningRate 0.0926   Epoch: 0   Global Step: 9430   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:38:46,116-Speed 3103.46 samples/sec   Loss 17.1773   LearningRate 0.0925   Epoch: 0   Global Step: 9440   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 02:38:49,481-Speed 3043.60 samples/sec   Loss 17.2579   LearningRate 0.0925   Epoch: 0   Global Step: 9450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:38:52,782-Speed 3103.22 samples/sec   Loss 17.2416   LearningRate 0.0925   Epoch: 0   Global Step: 9460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:38:56,090-Speed 3096.78 samples/sec   Loss 17.4052   LearningRate 0.0925   Epoch: 0   Global Step: 9470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:38:59,410-Speed 3084.92 samples/sec   Loss 17.3125   LearningRate 0.0925   Epoch: 0   Global Step: 9480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:39:02,732-Speed 3083.19 samples/sec   Loss 17.0603   LearningRate 0.0925   Epoch: 0   Global Step: 9490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:39:06,082-Speed 3057.50 samples/sec   Loss 17.0298   LearningRate 0.0925   Epoch: 0   Global Step: 9500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:39:09,440-Speed 3050.80 samples/sec   Loss 17.1837   LearningRate 0.0925   Epoch: 0   Global Step: 9510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:39:12,807-Speed 3042.06 samples/sec   Loss 17.0961   LearningRate 0.0925   Epoch: 0   Global Step: 9520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:39:16,071-Speed 3138.61 samples/sec   Loss 17.1638   LearningRate 0.0925   Epoch: 0   Global Step: 9530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:39:19,425-Speed 3053.22 samples/sec   Loss 17.1772   LearningRate 0.0925   Epoch: 0   Global Step: 9540   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:39:22,818-Speed 3019.05 samples/sec   Loss 17.1347   LearningRate 0.0925   Epoch: 0   Global Step: 9550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:39:26,165-Speed 3060.85 samples/sec   Loss 17.2832   LearningRate 0.0925   Epoch: 0   Global Step: 9560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:39:29,487-Speed 3083.00 samples/sec   Loss 17.0081   LearningRate 0.0924   Epoch: 0   Global Step: 9570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:39:32,807-Speed 3084.73 samples/sec   Loss 17.2992   LearningRate 0.0924   Epoch: 0   Global Step: 9580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:39:36,074-Speed 3135.54 samples/sec   Loss 17.2285   LearningRate 0.0924   Epoch: 0   Global Step: 9590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:39:39,471-Speed 3015.69 samples/sec   Loss 17.2818   LearningRate 0.0924   Epoch: 0   Global Step: 9600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:39:42,789-Speed 3086.76 samples/sec   Loss 17.3021   LearningRate 0.0924   Epoch: 0   Global Step: 9610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:39:46,197-Speed 3005.58 samples/sec   Loss 17.2346   LearningRate 0.0924   Epoch: 0   Global Step: 9620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:39:49,495-Speed 3106.24 samples/sec   Loss 17.1571   LearningRate 0.0924   Epoch: 0   Global Step: 9630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:39:52,816-Speed 3084.57 samples/sec   Loss 17.2436   LearningRate 0.0924   Epoch: 0   Global Step: 9640   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:39:56,141-Speed 3080.59 samples/sec   Loss 17.1263   LearningRate 0.0924   Epoch: 0   Global Step: 9650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:39:59,444-Speed 3101.54 samples/sec   Loss 17.1399   LearningRate 0.0924   Epoch: 0   Global Step: 9660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:02,752-Speed 3096.01 samples/sec   Loss 17.0466   LearningRate 0.0924   Epoch: 0   Global Step: 9670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:06,082-Speed 3076.07 samples/sec   Loss 17.0773   LearningRate 0.0924   Epoch: 0   Global Step: 9680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:09,444-Speed 3047.23 samples/sec   Loss 17.1119   LearningRate 0.0924   Epoch: 0   Global Step: 9690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:12,757-Speed 3091.15 samples/sec   Loss 16.9363   LearningRate 0.0923   Epoch: 0   Global Step: 9700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:16,112-Speed 3053.25 samples/sec   Loss 17.0753   LearningRate 0.0923   Epoch: 0   Global Step: 9710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:19,516-Speed 3009.48 samples/sec   Loss 17.0686   LearningRate 0.0923   Epoch: 0   Global Step: 9720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:22,851-Speed 3070.56 samples/sec   Loss 17.1548   LearningRate 0.0923   Epoch: 0   Global Step: 9730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:26,192-Speed 3066.56 samples/sec   Loss 17.0724   LearningRate 0.0923   Epoch: 0   Global Step: 9740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:29,491-Speed 3104.60 samples/sec   Loss 17.0646   LearningRate 0.0923   Epoch: 0   Global Step: 9750   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 02:40:32,849-Speed 3050.39 samples/sec   Loss 17.3209   LearningRate 0.0923   Epoch: 0   Global Step: 9760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:36,150-Speed 3103.32 samples/sec   Loss 17.0678   LearningRate 0.0923   Epoch: 0   Global Step: 9770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:39,445-Speed 3108.86 samples/sec   Loss 17.1809   LearningRate 0.0923   Epoch: 0   Global Step: 9780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:42,814-Speed 3039.88 samples/sec   Loss 17.1011   LearningRate 0.0923   Epoch: 0   Global Step: 9790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:46,191-Speed 3033.37 samples/sec   Loss 17.1482   LearningRate 0.0923   Epoch: 0   Global Step: 9800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:49,468-Speed 3126.28 samples/sec   Loss 17.2073   LearningRate 0.0923   Epoch: 0   Global Step: 9810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:52,767-Speed 3105.04 samples/sec   Loss 17.0879   LearningRate 0.0923   Epoch: 0   Global Step: 9820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:56,160-Speed 3018.61 samples/sec   Loss 17.2228   LearningRate 0.0922   Epoch: 0   Global Step: 9830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:40:59,510-Speed 3057.79 samples/sec   Loss 17.2557   LearningRate 0.0922   Epoch: 0   Global Step: 9840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:02,832-Speed 3082.87 samples/sec   Loss 17.0539   LearningRate 0.0922   Epoch: 0   Global Step: 9850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:06,192-Speed 3049.11 samples/sec   Loss 17.1212   LearningRate 0.0922   Epoch: 0   Global Step: 9860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:09,524-Speed 3073.76 samples/sec   Loss 17.0799   LearningRate 0.0922   Epoch: 0   Global Step: 9870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:12,878-Speed 3053.95 samples/sec   Loss 16.9762   LearningRate 0.0922   Epoch: 0   Global Step: 9880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:16,207-Speed 3077.28 samples/sec   Loss 17.0514   LearningRate 0.0922   Epoch: 0   Global Step: 9890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:19,520-Speed 3091.76 samples/sec   Loss 16.9554   LearningRate 0.0922   Epoch: 0   Global Step: 9900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:22,829-Speed 3095.39 samples/sec   Loss 17.0122   LearningRate 0.0922   Epoch: 0   Global Step: 9910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:26,184-Speed 3053.13 samples/sec   Loss 16.8065   LearningRate 0.0922   Epoch: 0   Global Step: 9920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:29,554-Speed 3038.94 samples/sec   Loss 17.1797   LearningRate 0.0922   Epoch: 0   Global Step: 9930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:32,891-Speed 3069.92 samples/sec   Loss 16.9584   LearningRate 0.0922   Epoch: 0   Global Step: 9940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:36,272-Speed 3029.79 samples/sec   Loss 17.1549   LearningRate 0.0921   Epoch: 0   Global Step: 9950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:39,608-Speed 3070.12 samples/sec   Loss 17.0574   LearningRate 0.0921   Epoch: 0   Global Step: 9960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:42,911-Speed 3100.88 samples/sec   Loss 17.1059   LearningRate 0.0921   Epoch: 0   Global Step: 9970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:46,268-Speed 3052.03 samples/sec   Loss 16.9694   LearningRate 0.0921   Epoch: 0   Global Step: 9980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:49,577-Speed 3094.84 samples/sec   Loss 16.9030   LearningRate 0.0921   Epoch: 0   Global Step: 9990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:52,915-Speed 3069.60 samples/sec   Loss 17.1578   LearningRate 0.0921   Epoch: 0   Global Step: 10000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:56,206-Speed 3112.30 samples/sec   Loss 16.8925   LearningRate 0.0921   Epoch: 0   Global Step: 10010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:41:59,523-Speed 3088.72 samples/sec   Loss 17.0468   LearningRate 0.0921   Epoch: 0   Global Step: 10020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:02,794-Speed 3131.88 samples/sec   Loss 17.0098   LearningRate 0.0921   Epoch: 0   Global Step: 10030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:06,062-Speed 3133.86 samples/sec   Loss 16.8644   LearningRate 0.0921   Epoch: 0   Global Step: 10040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:09,360-Speed 3106.44 samples/sec   Loss 17.0388   LearningRate 0.0921   Epoch: 0   Global Step: 10050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:12,704-Speed 3063.03 samples/sec   Loss 16.9868   LearningRate 0.0921   Epoch: 0   Global Step: 10060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:16,029-Speed 3080.71 samples/sec   Loss 16.8921   LearningRate 0.0921   Epoch: 0   Global Step: 10070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:19,335-Speed 3098.45 samples/sec   Loss 17.0205   LearningRate 0.0920   Epoch: 0   Global Step: 10080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:22,636-Speed 3102.81 samples/sec   Loss 16.9533   LearningRate 0.0920   Epoch: 0   Global Step: 10090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:25,952-Speed 3089.30 samples/sec   Loss 17.0867   LearningRate 0.0920   Epoch: 0   Global Step: 10100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:29,266-Speed 3091.26 samples/sec   Loss 17.1760   LearningRate 0.0920   Epoch: 0   Global Step: 10110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:32,596-Speed 3075.77 samples/sec   Loss 16.8928   LearningRate 0.0920   Epoch: 0   Global Step: 10120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:35,967-Speed 3038.85 samples/sec   Loss 16.9908   LearningRate 0.0920   Epoch: 0   Global Step: 10130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:39,314-Speed 3060.47 samples/sec   Loss 16.8698   LearningRate 0.0920   Epoch: 0   Global Step: 10140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:42,670-Speed 3051.79 samples/sec   Loss 16.9795   LearningRate 0.0920   Epoch: 0   Global Step: 10150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:45,992-Speed 3083.74 samples/sec   Loss 17.0486   LearningRate 0.0920   Epoch: 0   Global Step: 10160   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 02:42:49,256-Speed 3137.99 samples/sec   Loss 16.9037   LearningRate 0.0920   Epoch: 0   Global Step: 10170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:52,608-Speed 3055.22 samples/sec   Loss 16.8647   LearningRate 0.0920   Epoch: 0   Global Step: 10180   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:55,942-Speed 3072.35 samples/sec   Loss 16.9982   LearningRate 0.0920   Epoch: 0   Global Step: 10190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:42:59,252-Speed 3094.85 samples/sec   Loss 17.0433   LearningRate 0.0920   Epoch: 0   Global Step: 10200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:43:02,539-Speed 3116.09 samples/sec   Loss 17.0382   LearningRate 0.0919   Epoch: 0   Global Step: 10210   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:43:05,856-Speed 3088.25 samples/sec   Loss 17.0364   LearningRate 0.0919   Epoch: 0   Global Step: 10220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:43:09,230-Speed 3035.67 samples/sec   Loss 16.8776   LearningRate 0.0919   Epoch: 0   Global Step: 10230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:43:12,512-Speed 3120.82 samples/sec   Loss 16.9949   LearningRate 0.0919   Epoch: 0   Global Step: 10240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:43:15,810-Speed 3106.44 samples/sec   Loss 16.7922   LearningRate 0.0919   Epoch: 0   Global Step: 10250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:43:19,078-Speed 3134.21 samples/sec   Loss 16.8799   LearningRate 0.0919   Epoch: 0   Global Step: 10260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:43:22,332-Speed 3147.56 samples/sec   Loss 16.7737   LearningRate 0.0919   Epoch: 0   Global Step: 10270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:43:25,589-Speed 3146.07 samples/sec   Loss 16.7983   LearningRate 0.0919   Epoch: 0   Global Step: 10280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:43:28,908-Speed 3085.85 samples/sec   Loss 16.7373   LearningRate 0.0919   Epoch: 0   Global Step: 10290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:43:32,203-Speed 3108.93 samples/sec   Loss 16.9649   LearningRate 0.0919   Epoch: 0   Global Step: 10300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:43:35,498-Speed 3108.79 samples/sec   Loss 16.7647   LearningRate 0.0919   Epoch: 0   Global Step: 10310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:43:38,768-Speed 3131.92 samples/sec   Loss 16.8457   LearningRate 0.0919   Epoch: 0   Global Step: 10320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:43:42,052-Speed 3119.51 samples/sec   Loss 16.8785   LearningRate 0.0919   Epoch: 0   Global Step: 10330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:43:45,387-Speed 3073.45 samples/sec   Loss 16.9440   LearningRate 0.0918   Epoch: 0   Global Step: 10340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:43:48,739-Speed 3055.59 samples/sec   Loss 16.8669   LearningRate 0.0918   Epoch: 0   Global Step: 10350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:43:52,072-Speed 3073.22 samples/sec   Loss 16.9352   LearningRate 0.0918   Epoch: 0   Global Step: 10360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:43:55,377-Speed 3099.18 samples/sec   Loss 16.7513   LearningRate 0.0918   Epoch: 0   Global Step: 10370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:43:58,670-Speed 3111.25 samples/sec   Loss 16.8463   LearningRate 0.0918   Epoch: 0   Global Step: 10380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:44:01,990-Speed 3085.08 samples/sec   Loss 16.8381   LearningRate 0.0918   Epoch: 0   Global Step: 10390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:44:05,274-Speed 3119.40 samples/sec   Loss 16.7063   LearningRate 0.0918   Epoch: 0   Global Step: 10400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:08,647-Speed 3037.12 samples/sec   Loss 16.7585   LearningRate 0.0918   Epoch: 0   Global Step: 10410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:12,021-Speed 3035.76 samples/sec   Loss 16.8905   LearningRate 0.0918   Epoch: 0   Global Step: 10420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:15,400-Speed 3031.56 samples/sec   Loss 16.8120   LearningRate 0.0918   Epoch: 0   Global Step: 10430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:18,731-Speed 3075.21 samples/sec   Loss 16.8998   LearningRate 0.0918   Epoch: 0   Global Step: 10440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:22,005-Speed 3128.63 samples/sec   Loss 16.7908   LearningRate 0.0918   Epoch: 0   Global Step: 10450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:25,370-Speed 3044.28 samples/sec   Loss 16.7161   LearningRate 0.0918   Epoch: 0   Global Step: 10460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:28,704-Speed 3072.36 samples/sec   Loss 16.9426   LearningRate 0.0917   Epoch: 0   Global Step: 10470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:32,118-Speed 3000.22 samples/sec   Loss 16.9452   LearningRate 0.0917   Epoch: 0   Global Step: 10480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:35,405-Speed 3116.51 samples/sec   Loss 16.9470   LearningRate 0.0917   Epoch: 0   Global Step: 10490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:38,653-Speed 3152.92 samples/sec   Loss 16.8677   LearningRate 0.0917   Epoch: 0   Global Step: 10500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:41,983-Speed 3076.69 samples/sec   Loss 16.8201   LearningRate 0.0917   Epoch: 0   Global Step: 10510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:45,334-Speed 3056.37 samples/sec   Loss 16.6929   LearningRate 0.0917   Epoch: 0   Global Step: 10520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:48,679-Speed 3061.59 samples/sec   Loss 16.6830   LearningRate 0.0917   Epoch: 0   Global Step: 10530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:52,008-Speed 3077.13 samples/sec   Loss 16.8600   LearningRate 0.0917   Epoch: 0   Global Step: 10540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:55,386-Speed 3032.39 samples/sec   Loss 16.7128   LearningRate 0.0917   Epoch: 0   Global Step: 10550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:44:58,760-Speed 3036.28 samples/sec   Loss 16.8211   LearningRate 0.0917   Epoch: 0   Global Step: 10560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:45:02,090-Speed 3075.76 samples/sec   Loss 16.8903   LearningRate 0.0917   Epoch: 0   Global Step: 10570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:45:05,397-Speed 3097.70 samples/sec   Loss 16.7409   LearningRate 0.0917   Epoch: 0   Global Step: 10580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:45:08,684-Speed 3115.88 samples/sec   Loss 16.6223   LearningRate 0.0917   Epoch: 0   Global Step: 10590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:45:12,007-Speed 3082.64 samples/sec   Loss 16.7094   LearningRate 0.0916   Epoch: 0   Global Step: 10600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:45:15,404-Speed 3015.62 samples/sec   Loss 16.7530   LearningRate 0.0916   Epoch: 0   Global Step: 10610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:45:18,781-Speed 3033.33 samples/sec   Loss 16.7522   LearningRate 0.0916   Epoch: 0   Global Step: 10620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:45:22,122-Speed 3065.98 samples/sec   Loss 16.7940   LearningRate 0.0916   Epoch: 0   Global Step: 10630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:45:25,481-Speed 3049.57 samples/sec   Loss 16.6685   LearningRate 0.0916   Epoch: 0   Global Step: 10640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:45:28,866-Speed 3025.53 samples/sec   Loss 16.9121   LearningRate 0.0916   Epoch: 0   Global Step: 10650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:45:32,240-Speed 3035.93 samples/sec   Loss 16.7570   LearningRate 0.0916   Epoch: 0   Global Step: 10660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:45:35,595-Speed 3052.60 samples/sec   Loss 16.6403   LearningRate 0.0916   Epoch: 0   Global Step: 10670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:45:38,920-Speed 3081.50 samples/sec   Loss 16.6360   LearningRate 0.0916   Epoch: 0   Global Step: 10680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:45:42,243-Speed 3081.78 samples/sec   Loss 16.7850   LearningRate 0.0916   Epoch: 0   Global Step: 10690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:45:45,545-Speed 3102.81 samples/sec   Loss 16.7065   LearningRate 0.0916   Epoch: 0   Global Step: 10700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:45:48,917-Speed 3037.27 samples/sec   Loss 16.9738   LearningRate 0.0916   Epoch: 0   Global Step: 10710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:45:52,245-Speed 3077.69 samples/sec   Loss 16.6804   LearningRate 0.0916   Epoch: 0   Global Step: 10720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:45:55,595-Speed 3057.85 samples/sec   Loss 16.8791   LearningRate 0.0915   Epoch: 0   Global Step: 10730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:45:58,968-Speed 3036.93 samples/sec   Loss 16.5989   LearningRate 0.0915   Epoch: 0   Global Step: 10740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:46:02,331-Speed 3045.87 samples/sec   Loss 16.8175   LearningRate 0.0915   Epoch: 0   Global Step: 10750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:46:05,638-Speed 3097.47 samples/sec   Loss 16.5677   LearningRate 0.0915   Epoch: 0   Global Step: 10760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:46:09,022-Speed 3026.49 samples/sec   Loss 16.4066   LearningRate 0.0915   Epoch: 0   Global Step: 10770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:46:12,330-Speed 3095.95 samples/sec   Loss 16.8191   LearningRate 0.0915   Epoch: 0   Global Step: 10780   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 02:46:15,667-Speed 3070.12 samples/sec   Loss 16.8569   LearningRate 0.0915   Epoch: 0   Global Step: 10790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:46:19,076-Speed 3004.73 samples/sec   Loss 16.6558   LearningRate 0.0915   Epoch: 0   Global Step: 10800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:46:22,398-Speed 3082.94 samples/sec   Loss 16.4935   LearningRate 0.0915   Epoch: 0   Global Step: 10810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:46:25,655-Speed 3145.38 samples/sec   Loss 16.7167   LearningRate 0.0915   Epoch: 0   Global Step: 10820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:46:28,970-Speed 3089.44 samples/sec   Loss 16.7032   LearningRate 0.0915   Epoch: 0   Global Step: 10830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:46:32,261-Speed 3112.62 samples/sec   Loss 16.7091   LearningRate 0.0915   Epoch: 0   Global Step: 10840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:46:35,552-Speed 3112.84 samples/sec   Loss 16.4892   LearningRate 0.0915   Epoch: 0   Global Step: 10850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:46:38,852-Speed 3103.95 samples/sec   Loss 16.7831   LearningRate 0.0914   Epoch: 0   Global Step: 10860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:46:42,202-Speed 3057.40 samples/sec   Loss 16.8078   LearningRate 0.0914   Epoch: 0   Global Step: 10870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:46:45,559-Speed 3051.63 samples/sec   Loss 16.6877   LearningRate 0.0914   Epoch: 0   Global Step: 10880   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:46:48,918-Speed 3049.27 samples/sec   Loss 16.6148   LearningRate 0.0914   Epoch: 0   Global Step: 10890   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:46:52,188-Speed 3132.40 samples/sec   Loss 16.6374   LearningRate 0.0914   Epoch: 0   Global Step: 10900   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:46:55,440-Speed 3149.55 samples/sec   Loss 16.6191   LearningRate 0.0914   Epoch: 0   Global Step: 10910   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:46:58,729-Speed 3114.53 samples/sec   Loss 16.6294   LearningRate 0.0914   Epoch: 0   Global Step: 10920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:02,032-Speed 3101.40 samples/sec   Loss 16.7405   LearningRate 0.0914   Epoch: 0   Global Step: 10930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:05,345-Speed 3092.02 samples/sec   Loss 16.7966   LearningRate 0.0914   Epoch: 0   Global Step: 10940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:08,611-Speed 3137.41 samples/sec   Loss 16.5467   LearningRate 0.0914   Epoch: 0   Global Step: 10950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:11,912-Speed 3102.93 samples/sec   Loss 16.6581   LearningRate 0.0914   Epoch: 0   Global Step: 10960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:15,284-Speed 3037.61 samples/sec   Loss 16.6520   LearningRate 0.0914   Epoch: 0   Global Step: 10970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:18,662-Speed 3033.09 samples/sec   Loss 16.6672   LearningRate 0.0914   Epoch: 0   Global Step: 10980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:21,990-Speed 3077.49 samples/sec   Loss 16.5655   LearningRate 0.0913   Epoch: 0   Global Step: 10990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:25,330-Speed 3066.87 samples/sec   Loss 16.6240   LearningRate 0.0913   Epoch: 0   Global Step: 11000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:28,663-Speed 3073.55 samples/sec   Loss 16.6023   LearningRate 0.0913   Epoch: 0   Global Step: 11010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:31,992-Speed 3076.78 samples/sec   Loss 16.5194   LearningRate 0.0913   Epoch: 0   Global Step: 11020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:35,296-Speed 3100.81 samples/sec   Loss 16.7039   LearningRate 0.0913   Epoch: 0   Global Step: 11030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:38,634-Speed 3068.04 samples/sec   Loss 16.6981   LearningRate 0.0913   Epoch: 0   Global Step: 11040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:41,985-Speed 3057.11 samples/sec   Loss 16.5740   LearningRate 0.0913   Epoch: 0   Global Step: 11050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:45,283-Speed 3106.57 samples/sec   Loss 16.6028   LearningRate 0.0913   Epoch: 0   Global Step: 11060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:48,627-Speed 3062.81 samples/sec   Loss 16.6027   LearningRate 0.0913   Epoch: 0   Global Step: 11070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:52,026-Speed 3014.38 samples/sec   Loss 16.5663   LearningRate 0.0913   Epoch: 0   Global Step: 11080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:55,405-Speed 3030.52 samples/sec   Loss 16.7705   LearningRate 0.0913   Epoch: 0   Global Step: 11090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:47:58,730-Speed 3080.81 samples/sec   Loss 16.5998   LearningRate 0.0913   Epoch: 0   Global Step: 11100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:02,040-Speed 3094.91 samples/sec   Loss 16.3868   LearningRate 0.0913   Epoch: 0   Global Step: 11110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:05,343-Speed 3100.82 samples/sec   Loss 16.6244   LearningRate 0.0912   Epoch: 0   Global Step: 11120   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 02:48:08,632-Speed 3114.41 samples/sec   Loss 16.3685   LearningRate 0.0912   Epoch: 0   Global Step: 11130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:11,939-Speed 3096.81 samples/sec   Loss 16.5548   LearningRate 0.0912   Epoch: 0   Global Step: 11140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:15,269-Speed 3076.93 samples/sec   Loss 16.4868   LearningRate 0.0912   Epoch: 0   Global Step: 11150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:18,564-Speed 3108.39 samples/sec   Loss 16.5884   LearningRate 0.0912   Epoch: 0   Global Step: 11160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:21,869-Speed 3100.56 samples/sec   Loss 16.7002   LearningRate 0.0912   Epoch: 0   Global Step: 11170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:25,275-Speed 3007.11 samples/sec   Loss 16.8109   LearningRate 0.0912   Epoch: 0   Global Step: 11180   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:28,556-Speed 3122.00 samples/sec   Loss 16.5306   LearningRate 0.0912   Epoch: 0   Global Step: 11190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:31,941-Speed 3025.85 samples/sec   Loss 16.7107   LearningRate 0.0912   Epoch: 0   Global Step: 11200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:35,255-Speed 3090.24 samples/sec   Loss 16.5106   LearningRate 0.0912   Epoch: 0   Global Step: 11210   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:38,508-Speed 3148.87 samples/sec   Loss 16.4677   LearningRate 0.0912   Epoch: 0   Global Step: 11220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:41,791-Speed 3119.94 samples/sec   Loss 16.3701   LearningRate 0.0912   Epoch: 0   Global Step: 11230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:45,127-Speed 3070.58 samples/sec   Loss 16.4473   LearningRate 0.0912   Epoch: 0   Global Step: 11240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:48,404-Speed 3125.66 samples/sec   Loss 16.5262   LearningRate 0.0911   Epoch: 0   Global Step: 11250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:51,678-Speed 3129.13 samples/sec   Loss 16.7497   LearningRate 0.0911   Epoch: 0   Global Step: 11260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:54,938-Speed 3141.67 samples/sec   Loss 16.6490   LearningRate 0.0911   Epoch: 0   Global Step: 11270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:48:58,251-Speed 3091.77 samples/sec   Loss 16.6507   LearningRate 0.0911   Epoch: 0   Global Step: 11280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:01,565-Speed 3090.53 samples/sec   Loss 16.4487   LearningRate 0.0911   Epoch: 0   Global Step: 11290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:04,818-Speed 3149.00 samples/sec   Loss 16.5035   LearningRate 0.0911   Epoch: 0   Global Step: 11300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:08,180-Speed 3046.53 samples/sec   Loss 16.6468   LearningRate 0.0911   Epoch: 0   Global Step: 11310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:11,479-Speed 3105.09 samples/sec   Loss 16.5853   LearningRate 0.0911   Epoch: 0   Global Step: 11320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:14,744-Speed 3137.57 samples/sec   Loss 16.6617   LearningRate 0.0911   Epoch: 0   Global Step: 11330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:18,047-Speed 3100.85 samples/sec   Loss 16.6227   LearningRate 0.0911   Epoch: 0   Global Step: 11340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:21,360-Speed 3091.63 samples/sec   Loss 16.5954   LearningRate 0.0911   Epoch: 0   Global Step: 11350   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:24,702-Speed 3065.34 samples/sec   Loss 16.5837   LearningRate 0.0911   Epoch: 0   Global Step: 11360   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:28,004-Speed 3101.57 samples/sec   Loss 16.4805   LearningRate 0.0911   Epoch: 0   Global Step: 11370   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:31,372-Speed 3041.15 samples/sec   Loss 16.4195   LearningRate 0.0910   Epoch: 0   Global Step: 11380   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:34,669-Speed 3107.33 samples/sec   Loss 16.4717   LearningRate 0.0910   Epoch: 0   Global Step: 11390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:38,033-Speed 3045.14 samples/sec   Loss 16.3940   LearningRate 0.0910   Epoch: 0   Global Step: 11400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:41,346-Speed 3092.13 samples/sec   Loss 16.4208   LearningRate 0.0910   Epoch: 0   Global Step: 11410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:44,688-Speed 3064.94 samples/sec   Loss 16.5680   LearningRate 0.0910   Epoch: 0   Global Step: 11420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:48,024-Speed 3070.29 samples/sec   Loss 16.5615   LearningRate 0.0910   Epoch: 0   Global Step: 11430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:51,277-Speed 3149.17 samples/sec   Loss 16.3639   LearningRate 0.0910   Epoch: 0   Global Step: 11440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:54,557-Speed 3122.59 samples/sec   Loss 16.4942   LearningRate 0.0910   Epoch: 0   Global Step: 11450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:49:57,854-Speed 3107.04 samples/sec   Loss 16.3812   LearningRate 0.0910   Epoch: 0   Global Step: 11460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:50:01,127-Speed 3129.22 samples/sec   Loss 16.3440   LearningRate 0.0910   Epoch: 0   Global Step: 11470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:50:04,424-Speed 3107.69 samples/sec   Loss 16.2078   LearningRate 0.0910   Epoch: 0   Global Step: 11480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:50:07,761-Speed 3069.54 samples/sec   Loss 16.5060   LearningRate 0.0910   Epoch: 0   Global Step: 11490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:50:11,061-Speed 3103.25 samples/sec   Loss 16.3777   LearningRate 0.0910   Epoch: 0   Global Step: 11500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:50:14,403-Speed 3065.01 samples/sec   Loss 16.4632   LearningRate 0.0909   Epoch: 0   Global Step: 11510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:50:17,673-Speed 3132.71 samples/sec   Loss 16.3848   LearningRate 0.0909   Epoch: 0   Global Step: 11520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:50:21,028-Speed 3052.74 samples/sec   Loss 16.5104   LearningRate 0.0909   Epoch: 0   Global Step: 11530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:50:24,393-Speed 3044.12 samples/sec   Loss 16.3942   LearningRate 0.0909   Epoch: 0   Global Step: 11540   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:50:27,667-Speed 3128.26 samples/sec   Loss 16.3609   LearningRate 0.0909   Epoch: 0   Global Step: 11550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:50:31,000-Speed 3072.99 samples/sec   Loss 16.3314   LearningRate 0.0909   Epoch: 0   Global Step: 11560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:50:34,338-Speed 3069.01 samples/sec   Loss 16.4701   LearningRate 0.0909   Epoch: 0   Global Step: 11570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:50:37,588-Speed 3151.06 samples/sec   Loss 16.3791   LearningRate 0.0909   Epoch: 0   Global Step: 11580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:50:40,886-Speed 3105.74 samples/sec   Loss 16.4381   LearningRate 0.0909   Epoch: 0   Global Step: 11590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:50:44,206-Speed 3085.80 samples/sec   Loss 16.4607   LearningRate 0.0909   Epoch: 0   Global Step: 11600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:50:47,478-Speed 3129.41 samples/sec   Loss 16.3916   LearningRate 0.0909   Epoch: 0   Global Step: 11610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:50:50,794-Speed 3089.63 samples/sec   Loss 16.3516   LearningRate 0.0909   Epoch: 0   Global Step: 11620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:50:54,083-Speed 3114.21 samples/sec   Loss 16.3277   LearningRate 0.0909   Epoch: 0   Global Step: 11630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:50:57,366-Speed 3120.05 samples/sec   Loss 16.5946   LearningRate 0.0908   Epoch: 0   Global Step: 11640   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:51:00,765-Speed 3013.21 samples/sec   Loss 16.3211   LearningRate 0.0908   Epoch: 0   Global Step: 11650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:51:04,066-Speed 3103.62 samples/sec   Loss 16.4943   LearningRate 0.0908   Epoch: 0   Global Step: 11660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:51:07,323-Speed 3144.55 samples/sec   Loss 16.6607   LearningRate 0.0908   Epoch: 0   Global Step: 11670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:51:10,679-Speed 3052.61 samples/sec   Loss 16.4356   LearningRate 0.0908   Epoch: 0   Global Step: 11680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:51:14,004-Speed 3080.92 samples/sec   Loss 16.2835   LearningRate 0.0908   Epoch: 0   Global Step: 11690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:51:17,283-Speed 3124.11 samples/sec   Loss 16.5251   LearningRate 0.0908   Epoch: 0   Global Step: 11700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:51:20,557-Speed 3129.41 samples/sec   Loss 16.2367   LearningRate 0.0908   Epoch: 0   Global Step: 11710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:51:23,813-Speed 3144.92 samples/sec   Loss 16.4716   LearningRate 0.0908   Epoch: 0   Global Step: 11720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:51:27,160-Speed 3060.68 samples/sec   Loss 16.2839   LearningRate 0.0908   Epoch: 0   Global Step: 11730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:51:30,471-Speed 3094.19 samples/sec   Loss 16.3159   LearningRate 0.0908   Epoch: 0   Global Step: 11740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:51:33,797-Speed 3079.53 samples/sec   Loss 16.4093   LearningRate 0.0908   Epoch: 0   Global Step: 11750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:51:37,100-Speed 3101.11 samples/sec   Loss 16.4318   LearningRate 0.0908   Epoch: 0   Global Step: 11760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:51:40,466-Speed 3042.66 samples/sec   Loss 16.4872   LearningRate 0.0907   Epoch: 0   Global Step: 11770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:51:43,744-Speed 3125.32 samples/sec   Loss 16.3467   LearningRate 0.0907   Epoch: 0   Global Step: 11780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:51:47,115-Speed 3038.66 samples/sec   Loss 16.2837   LearningRate 0.0907   Epoch: 0   Global Step: 11790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:51:50,384-Speed 3132.88 samples/sec   Loss 16.2637   LearningRate 0.0907   Epoch: 0   Global Step: 11800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:51:53,685-Speed 3103.40 samples/sec   Loss 16.5279   LearningRate 0.0907   Epoch: 0   Global Step: 11810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:51:57,002-Speed 3088.33 samples/sec   Loss 16.2531   LearningRate 0.0907   Epoch: 0   Global Step: 11820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:52:00,312-Speed 3093.94 samples/sec   Loss 16.3700   LearningRate 0.0907   Epoch: 0   Global Step: 11830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:52:03,627-Speed 3089.89 samples/sec   Loss 16.4625   LearningRate 0.0907   Epoch: 0   Global Step: 11840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:52:06,907-Speed 3123.58 samples/sec   Loss 16.4876   LearningRate 0.0907   Epoch: 0   Global Step: 11850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:52:10,226-Speed 3085.54 samples/sec   Loss 16.2732   LearningRate 0.0907   Epoch: 0   Global Step: 11860   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:52:13,550-Speed 3081.06 samples/sec   Loss 16.3740   LearningRate 0.0907   Epoch: 0   Global Step: 11870   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:52:16,868-Speed 3088.01 samples/sec   Loss 16.2265   LearningRate 0.0907   Epoch: 0   Global Step: 11880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:52:20,247-Speed 3031.26 samples/sec   Loss 16.3603   LearningRate 0.0907   Epoch: 0   Global Step: 11890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:52:23,563-Speed 3089.37 samples/sec   Loss 16.3864   LearningRate 0.0906   Epoch: 0   Global Step: 11900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:52:26,885-Speed 3083.28 samples/sec   Loss 16.2316   LearningRate 0.0906   Epoch: 0   Global Step: 11910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:52:30,261-Speed 3033.89 samples/sec   Loss 16.3167   LearningRate 0.0906   Epoch: 0   Global Step: 11920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:52:33,529-Speed 3135.06 samples/sec   Loss 16.4669   LearningRate 0.0906   Epoch: 0   Global Step: 11930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:52:36,829-Speed 3103.57 samples/sec   Loss 16.3446   LearningRate 0.0906   Epoch: 0   Global Step: 11940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:52:40,227-Speed 3014.19 samples/sec   Loss 16.4713   LearningRate 0.0906   Epoch: 0   Global Step: 11950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:52:43,570-Speed 3064.00 samples/sec   Loss 16.1059   LearningRate 0.0906   Epoch: 0   Global Step: 11960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:52:46,877-Speed 3098.47 samples/sec   Loss 16.4929   LearningRate 0.0906   Epoch: 0   Global Step: 11970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:52:50,201-Speed 3080.99 samples/sec   Loss 16.2102   LearningRate 0.0906   Epoch: 0   Global Step: 11980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:52:53,566-Speed 3044.45 samples/sec   Loss 16.3605   LearningRate 0.0906   Epoch: 0   Global Step: 11990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:52:56,845-Speed 3123.99 samples/sec   Loss 16.2622   LearningRate 0.0906   Epoch: 0   Global Step: 12000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:00,194-Speed 3057.74 samples/sec   Loss 16.3318   LearningRate 0.0906   Epoch: 0   Global Step: 12010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:03,515-Speed 3084.38 samples/sec   Loss 16.3079   LearningRate 0.0906   Epoch: 0   Global Step: 12020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:06,879-Speed 3044.92 samples/sec   Loss 16.4042   LearningRate 0.0905   Epoch: 0   Global Step: 12030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:10,203-Speed 3081.91 samples/sec   Loss 16.3529   LearningRate 0.0905   Epoch: 0   Global Step: 12040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:13,539-Speed 3070.89 samples/sec   Loss 16.3095   LearningRate 0.0905   Epoch: 0   Global Step: 12050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:16,903-Speed 3044.58 samples/sec   Loss 16.4192   LearningRate 0.0905   Epoch: 0   Global Step: 12060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:20,219-Speed 3088.87 samples/sec   Loss 16.3990   LearningRate 0.0905   Epoch: 0   Global Step: 12070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:23,589-Speed 3039.68 samples/sec   Loss 16.2786   LearningRate 0.0905   Epoch: 0   Global Step: 12080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:26,856-Speed 3135.79 samples/sec   Loss 16.2042   LearningRate 0.0905   Epoch: 0   Global Step: 12090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:30,129-Speed 3129.29 samples/sec   Loss 16.1156   LearningRate 0.0905   Epoch: 0   Global Step: 12100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:33,482-Speed 3054.92 samples/sec   Loss 16.4011   LearningRate 0.0905   Epoch: 0   Global Step: 12110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:36,791-Speed 3095.93 samples/sec   Loss 16.3716   LearningRate 0.0905   Epoch: 0   Global Step: 12120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:40,130-Speed 3067.78 samples/sec   Loss 16.2198   LearningRate 0.0905   Epoch: 0   Global Step: 12130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:43,451-Speed 3084.30 samples/sec   Loss 16.3042   LearningRate 0.0905   Epoch: 0   Global Step: 12140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:46,817-Speed 3042.82 samples/sec   Loss 16.4575   LearningRate 0.0905   Epoch: 0   Global Step: 12150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:50,159-Speed 3064.70 samples/sec   Loss 16.3431   LearningRate 0.0904   Epoch: 0   Global Step: 12160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:53,523-Speed 3045.44 samples/sec   Loss 16.3703   LearningRate 0.0904   Epoch: 0   Global Step: 12170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:53:56,818-Speed 3108.99 samples/sec   Loss 16.4280   LearningRate 0.0904   Epoch: 0   Global Step: 12180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:54:00,127-Speed 3095.04 samples/sec   Loss 16.2319   LearningRate 0.0904   Epoch: 0   Global Step: 12190   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:54:03,411-Speed 3119.86 samples/sec   Loss 16.2686   LearningRate 0.0904   Epoch: 0   Global Step: 12200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:54:06,726-Speed 3089.18 samples/sec   Loss 16.1769   LearningRate 0.0904   Epoch: 0   Global Step: 12210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:54:10,065-Speed 3068.42 samples/sec   Loss 16.2671   LearningRate 0.0904   Epoch: 0   Global Step: 12220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:54:13,380-Speed 3089.54 samples/sec   Loss 16.2229   LearningRate 0.0904   Epoch: 0   Global Step: 12230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:54:16,711-Speed 3075.42 samples/sec   Loss 16.3372   LearningRate 0.0904   Epoch: 0   Global Step: 12240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:54:20,082-Speed 3038.23 samples/sec   Loss 16.1857   LearningRate 0.0904   Epoch: 0   Global Step: 12250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:54:23,426-Speed 3063.40 samples/sec   Loss 16.1687   LearningRate 0.0904   Epoch: 0   Global Step: 12260   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:54:26,765-Speed 3067.78 samples/sec   Loss 16.2308   LearningRate 0.0904   Epoch: 0   Global Step: 12270   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:54:30,093-Speed 3077.63 samples/sec   Loss 16.2654   LearningRate 0.0904   Epoch: 0   Global Step: 12280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:54:33,423-Speed 3075.96 samples/sec   Loss 16.3179   LearningRate 0.0904   Epoch: 0   Global Step: 12290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:54:36,724-Speed 3103.72 samples/sec   Loss 16.2387   LearningRate 0.0903   Epoch: 0   Global Step: 12300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:54:40,102-Speed 3032.10 samples/sec   Loss 16.3022   LearningRate 0.0903   Epoch: 0   Global Step: 12310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:54:43,457-Speed 3053.22 samples/sec   Loss 16.1577   LearningRate 0.0903   Epoch: 0   Global Step: 12320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:54:46,814-Speed 3050.33 samples/sec   Loss 16.3790   LearningRate 0.0903   Epoch: 0   Global Step: 12330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:54:50,194-Speed 3031.16 samples/sec   Loss 16.2926   LearningRate 0.0903   Epoch: 0   Global Step: 12340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:54:53,574-Speed 3029.85 samples/sec   Loss 16.2590   LearningRate 0.0903   Epoch: 0   Global Step: 12350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:54:56,924-Speed 3058.04 samples/sec   Loss 16.2450   LearningRate 0.0903   Epoch: 0   Global Step: 12360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:55:00,238-Speed 3090.83 samples/sec   Loss 16.1958   LearningRate 0.0903   Epoch: 0   Global Step: 12370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:55:03,569-Speed 3074.94 samples/sec   Loss 16.2714   LearningRate 0.0903   Epoch: 0   Global Step: 12380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:55:06,896-Speed 3078.20 samples/sec   Loss 16.1941   LearningRate 0.0903   Epoch: 0   Global Step: 12390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:55:10,222-Speed 3079.73 samples/sec   Loss 16.2552   LearningRate 0.0903   Epoch: 0   Global Step: 12400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:55:13,726-Speed 2924.05 samples/sec   Loss 16.3147   LearningRate 0.0903   Epoch: 0   Global Step: 12410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:55:16,991-Speed 3136.71 samples/sec   Loss 16.1806   LearningRate 0.0903   Epoch: 0   Global Step: 12420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:55:49,255-Speed 317.40 samples/sec   Loss 14.8885   LearningRate 0.0902   Epoch: 1   Global Step: 12430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:55:52,674-Speed 2996.43 samples/sec   Loss 14.7384   LearningRate 0.0902   Epoch: 1   Global Step: 12440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:55:55,976-Speed 3102.16 samples/sec   Loss 14.5572   LearningRate 0.0902   Epoch: 1   Global Step: 12450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:55:59,273-Speed 3106.73 samples/sec   Loss 14.6650   LearningRate 0.0902   Epoch: 1   Global Step: 12460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:56:02,616-Speed 3064.37 samples/sec   Loss 14.6077   LearningRate 0.0902   Epoch: 1   Global Step: 12470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:56:05,944-Speed 3077.48 samples/sec   Loss 14.5340   LearningRate 0.0902   Epoch: 1   Global Step: 12480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:56:09,284-Speed 3066.74 samples/sec   Loss 14.5251   LearningRate 0.0902   Epoch: 1   Global Step: 12490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:56:12,639-Speed 3053.45 samples/sec   Loss 14.6209   LearningRate 0.0902   Epoch: 1   Global Step: 12500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:56:15,934-Speed 3108.01 samples/sec   Loss 14.5287   LearningRate 0.0902   Epoch: 1   Global Step: 12510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:56:19,229-Speed 3109.08 samples/sec   Loss 14.6184   LearningRate 0.0902   Epoch: 1   Global Step: 12520   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 02:56:22,593-Speed 3045.45 samples/sec   Loss 14.5791   LearningRate 0.0902   Epoch: 1   Global Step: 12530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:56:25,903-Speed 3093.88 samples/sec   Loss 14.6170   LearningRate 0.0902   Epoch: 1   Global Step: 12540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:56:29,241-Speed 3068.69 samples/sec   Loss 14.7209   LearningRate 0.0902   Epoch: 1   Global Step: 12550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:56:32,586-Speed 3062.77 samples/sec   Loss 14.8675   LearningRate 0.0901   Epoch: 1   Global Step: 12560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:56:35,909-Speed 3081.91 samples/sec   Loss 14.6752   LearningRate 0.0901   Epoch: 1   Global Step: 12570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:56:39,288-Speed 3031.47 samples/sec   Loss 14.7787   LearningRate 0.0901   Epoch: 1   Global Step: 12580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:56:42,642-Speed 3054.01 samples/sec   Loss 14.7225   LearningRate 0.0901   Epoch: 1   Global Step: 12590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:56:45,921-Speed 3124.54 samples/sec   Loss 14.6160   LearningRate 0.0901   Epoch: 1   Global Step: 12600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:56:49,205-Speed 3118.58 samples/sec   Loss 14.6300   LearningRate 0.0901   Epoch: 1   Global Step: 12610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:56:52,561-Speed 3052.47 samples/sec   Loss 14.7292   LearningRate 0.0901   Epoch: 1   Global Step: 12620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:56:55,841-Speed 3122.32 samples/sec   Loss 14.7992   LearningRate 0.0901   Epoch: 1   Global Step: 12630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:56:59,160-Speed 3086.39 samples/sec   Loss 14.8322   LearningRate 0.0901   Epoch: 1   Global Step: 12640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:57:02,456-Speed 3108.29 samples/sec   Loss 14.7572   LearningRate 0.0901   Epoch: 1   Global Step: 12650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:57:05,770-Speed 3090.25 samples/sec   Loss 14.4447   LearningRate 0.0901   Epoch: 1   Global Step: 12660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:57:09,082-Speed 3093.53 samples/sec   Loss 14.7249   LearningRate 0.0901   Epoch: 1   Global Step: 12670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:57:12,814-Speed 2744.38 samples/sec   Loss 14.9808   LearningRate 0.0901   Epoch: 1   Global Step: 12680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:57:16,198-Speed 3027.48 samples/sec   Loss 14.8378   LearningRate 0.0900   Epoch: 1   Global Step: 12690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:57:19,541-Speed 3063.87 samples/sec   Loss 14.8591   LearningRate 0.0900   Epoch: 1   Global Step: 12700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:57:22,839-Speed 3106.32 samples/sec   Loss 14.7530   LearningRate 0.0900   Epoch: 1   Global Step: 12710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:57:26,157-Speed 3086.17 samples/sec   Loss 14.7569   LearningRate 0.0900   Epoch: 1   Global Step: 12720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:57:29,522-Speed 3044.46 samples/sec   Loss 14.6261   LearningRate 0.0900   Epoch: 1   Global Step: 12730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:57:32,813-Speed 3112.43 samples/sec   Loss 14.7766   LearningRate 0.0900   Epoch: 1   Global Step: 12740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:57:36,194-Speed 3029.11 samples/sec   Loss 14.8770   LearningRate 0.0900   Epoch: 1   Global Step: 12750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:57:39,571-Speed 3034.48 samples/sec   Loss 14.8192   LearningRate 0.0900   Epoch: 1   Global Step: 12760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:57:42,966-Speed 3016.54 samples/sec   Loss 14.8349   LearningRate 0.0900   Epoch: 1   Global Step: 12770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:57:46,291-Speed 3080.25 samples/sec   Loss 14.6880   LearningRate 0.0900   Epoch: 1   Global Step: 12780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:57:49,626-Speed 3071.49 samples/sec   Loss 14.8086   LearningRate 0.0900   Epoch: 1   Global Step: 12790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:57:52,985-Speed 3050.39 samples/sec   Loss 14.8144   LearningRate 0.0900   Epoch: 1   Global Step: 12800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:57:56,300-Speed 3089.43 samples/sec   Loss 14.9611   LearningRate 0.0900   Epoch: 1   Global Step: 12810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:57:59,625-Speed 3081.22 samples/sec   Loss 14.9227   LearningRate 0.0899   Epoch: 1   Global Step: 12820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:02,936-Speed 3093.63 samples/sec   Loss 14.9270   LearningRate 0.0899   Epoch: 1   Global Step: 12830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:06,245-Speed 3095.11 samples/sec   Loss 14.8815   LearningRate 0.0899   Epoch: 1   Global Step: 12840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:09,560-Speed 3090.56 samples/sec   Loss 14.8677   LearningRate 0.0899   Epoch: 1   Global Step: 12850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:12,914-Speed 3053.77 samples/sec   Loss 15.0331   LearningRate 0.0899   Epoch: 1   Global Step: 12860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:16,234-Speed 3085.55 samples/sec   Loss 14.9208   LearningRate 0.0899   Epoch: 1   Global Step: 12870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:19,563-Speed 3076.94 samples/sec   Loss 14.9253   LearningRate 0.0899   Epoch: 1   Global Step: 12880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:22,886-Speed 3082.04 samples/sec   Loss 14.9328   LearningRate 0.0899   Epoch: 1   Global Step: 12890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:26,222-Speed 3071.36 samples/sec   Loss 14.8005   LearningRate 0.0899   Epoch: 1   Global Step: 12900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:29,568-Speed 3061.16 samples/sec   Loss 15.0490   LearningRate 0.0899   Epoch: 1   Global Step: 12910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:32,932-Speed 3045.08 samples/sec   Loss 14.9112   LearningRate 0.0899   Epoch: 1   Global Step: 12920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:36,222-Speed 3114.12 samples/sec   Loss 14.8653   LearningRate 0.0899   Epoch: 1   Global Step: 12930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:39,599-Speed 3033.18 samples/sec   Loss 14.8855   LearningRate 0.0899   Epoch: 1   Global Step: 12940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:42,898-Speed 3104.61 samples/sec   Loss 14.9197   LearningRate 0.0898   Epoch: 1   Global Step: 12950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:46,221-Speed 3081.99 samples/sec   Loss 14.9560   LearningRate 0.0898   Epoch: 1   Global Step: 12960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:49,542-Speed 3084.65 samples/sec   Loss 14.8906   LearningRate 0.0898   Epoch: 1   Global Step: 12970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:58:52,853-Speed 3094.00 samples/sec   Loss 14.9628   LearningRate 0.0898   Epoch: 1   Global Step: 12980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:58:56,154-Speed 3103.16 samples/sec   Loss 15.0403   LearningRate 0.0898   Epoch: 1   Global Step: 12990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:58:59,406-Speed 3149.81 samples/sec   Loss 15.0044   LearningRate 0.0898   Epoch: 1   Global Step: 13000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:59:02,660-Speed 3148.07 samples/sec   Loss 14.9806   LearningRate 0.0898   Epoch: 1   Global Step: 13010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:59:05,962-Speed 3101.59 samples/sec   Loss 15.0199   LearningRate 0.0898   Epoch: 1   Global Step: 13020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:59:09,290-Speed 3078.25 samples/sec   Loss 15.0194   LearningRate 0.0898   Epoch: 1   Global Step: 13030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:59:12,592-Speed 3102.46 samples/sec   Loss 14.9975   LearningRate 0.0898   Epoch: 1   Global Step: 13040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:59:15,922-Speed 3075.51 samples/sec   Loss 15.1050   LearningRate 0.0898   Epoch: 1   Global Step: 13050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:59:19,261-Speed 3067.49 samples/sec   Loss 14.9692   LearningRate 0.0898   Epoch: 1   Global Step: 13060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:59:22,517-Speed 3146.48 samples/sec   Loss 15.1222   LearningRate 0.0898   Epoch: 1   Global Step: 13070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:59:25,795-Speed 3124.95 samples/sec   Loss 15.0411   LearningRate 0.0897   Epoch: 1   Global Step: 13080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:59:29,063-Speed 3134.80 samples/sec   Loss 14.9702   LearningRate 0.0897   Epoch: 1   Global Step: 13090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:59:32,342-Speed 3124.17 samples/sec   Loss 15.1052   LearningRate 0.0897   Epoch: 1   Global Step: 13100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:59:35,594-Speed 3148.94 samples/sec   Loss 15.0715   LearningRate 0.0897   Epoch: 1   Global Step: 13110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:59:38,866-Speed 3131.72 samples/sec   Loss 14.9595   LearningRate 0.0897   Epoch: 1   Global Step: 13120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:59:42,129-Speed 3138.95 samples/sec   Loss 14.9461   LearningRate 0.0897   Epoch: 1   Global Step: 13130   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:59:45,407-Speed 3124.24 samples/sec   Loss 15.1797   LearningRate 0.0897   Epoch: 1   Global Step: 13140   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:59:48,697-Speed 3113.64 samples/sec   Loss 15.1761   LearningRate 0.0897   Epoch: 1   Global Step: 13150   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:59:51,987-Speed 3113.23 samples/sec   Loss 15.0283   LearningRate 0.0897   Epoch: 1   Global Step: 13160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:59:55,377-Speed 3021.90 samples/sec   Loss 15.1091   LearningRate 0.0897   Epoch: 1   Global Step: 13170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:59:58,705-Speed 3078.68 samples/sec   Loss 15.2196   LearningRate 0.0897   Epoch: 1   Global Step: 13180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:00:02,083-Speed 3032.48 samples/sec   Loss 15.0981   LearningRate 0.0897   Epoch: 1   Global Step: 13190   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:00:05,421-Speed 3068.08 samples/sec   Loss 15.0690   LearningRate 0.0897   Epoch: 1   Global Step: 13200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:00:08,740-Speed 3086.22 samples/sec   Loss 15.1177   LearningRate 0.0896   Epoch: 1   Global Step: 13210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:00:12,045-Speed 3099.62 samples/sec   Loss 15.2257   LearningRate 0.0896   Epoch: 1   Global Step: 13220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:00:15,475-Speed 2986.02 samples/sec   Loss 15.2738   LearningRate 0.0896   Epoch: 1   Global Step: 13230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:00:18,759-Speed 3119.27 samples/sec   Loss 15.2274   LearningRate 0.0896   Epoch: 1   Global Step: 13240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:00:22,129-Speed 3039.71 samples/sec   Loss 15.1597   LearningRate 0.0896   Epoch: 1   Global Step: 13250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:00:25,459-Speed 3075.26 samples/sec   Loss 15.1568   LearningRate 0.0896   Epoch: 1   Global Step: 13260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:00:28,776-Speed 3088.31 samples/sec   Loss 15.2955   LearningRate 0.0896   Epoch: 1   Global Step: 13270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:00:32,135-Speed 3050.09 samples/sec   Loss 15.3286   LearningRate 0.0896   Epoch: 1   Global Step: 13280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:00:35,477-Speed 3064.53 samples/sec   Loss 15.1449   LearningRate 0.0896   Epoch: 1   Global Step: 13290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:00:38,785-Speed 3096.18 samples/sec   Loss 15.0646   LearningRate 0.0896   Epoch: 1   Global Step: 13300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:00:42,106-Speed 3084.24 samples/sec   Loss 15.1554   LearningRate 0.0896   Epoch: 1   Global Step: 13310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:00:45,375-Speed 3133.41 samples/sec   Loss 15.1456   LearningRate 0.0896   Epoch: 1   Global Step: 13320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:00:48,658-Speed 3120.08 samples/sec   Loss 15.2742   LearningRate 0.0896   Epoch: 1   Global Step: 13330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:00:51,940-Speed 3121.11 samples/sec   Loss 15.1303   LearningRate 0.0895   Epoch: 1   Global Step: 13340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:00:55,258-Speed 3087.43 samples/sec   Loss 15.1286   LearningRate 0.0895   Epoch: 1   Global Step: 13350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:00:58,533-Speed 3126.95 samples/sec   Loss 15.2865   LearningRate 0.0895   Epoch: 1   Global Step: 13360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:01:01,869-Speed 3070.32 samples/sec   Loss 15.1967   LearningRate 0.0895   Epoch: 1   Global Step: 13370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:01:05,147-Speed 3125.32 samples/sec   Loss 15.3553   LearningRate 0.0895   Epoch: 1   Global Step: 13380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:01:08,482-Speed 3071.07 samples/sec   Loss 15.1848   LearningRate 0.0895   Epoch: 1   Global Step: 13390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:01:11,785-Speed 3101.40 samples/sec   Loss 15.1947   LearningRate 0.0895   Epoch: 1   Global Step: 13400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:01:15,190-Speed 3008.67 samples/sec   Loss 15.3134   LearningRate 0.0895   Epoch: 1   Global Step: 13410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:01:18,488-Speed 3105.30 samples/sec   Loss 15.2959   LearningRate 0.0895   Epoch: 1   Global Step: 13420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:01:21,784-Speed 3107.74 samples/sec   Loss 15.2614   LearningRate 0.0895   Epoch: 1   Global Step: 13430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:01:25,113-Speed 3076.78 samples/sec   Loss 15.2290   LearningRate 0.0895   Epoch: 1   Global Step: 13440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:01:28,379-Speed 3136.74 samples/sec   Loss 15.2232   LearningRate 0.0895   Epoch: 1   Global Step: 13450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:01:31,668-Speed 3114.89 samples/sec   Loss 15.2027   LearningRate 0.0895   Epoch: 1   Global Step: 13460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:01:35,005-Speed 3068.82 samples/sec   Loss 15.2724   LearningRate 0.0894   Epoch: 1   Global Step: 13470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:01:38,281-Speed 3127.71 samples/sec   Loss 15.2649   LearningRate 0.0894   Epoch: 1   Global Step: 13480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:01:41,573-Speed 3111.06 samples/sec   Loss 15.4311   LearningRate 0.0894   Epoch: 1   Global Step: 13490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:01:44,873-Speed 3103.92 samples/sec   Loss 15.2016   LearningRate 0.0894   Epoch: 1   Global Step: 13500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:01:48,192-Speed 3085.66 samples/sec   Loss 15.0670   LearningRate 0.0894   Epoch: 1   Global Step: 13510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:01:51,529-Speed 3070.20 samples/sec   Loss 15.2891   LearningRate 0.0894   Epoch: 1   Global Step: 13520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:01:54,860-Speed 3075.06 samples/sec   Loss 15.4165   LearningRate 0.0894   Epoch: 1   Global Step: 13530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:01:58,138-Speed 3124.95 samples/sec   Loss 15.3444   LearningRate 0.0894   Epoch: 1   Global Step: 13540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:01,454-Speed 3088.54 samples/sec   Loss 15.2358   LearningRate 0.0894   Epoch: 1   Global Step: 13550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:04,769-Speed 3089.72 samples/sec   Loss 15.3371   LearningRate 0.0894   Epoch: 1   Global Step: 13560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:08,117-Speed 3059.24 samples/sec   Loss 15.3004   LearningRate 0.0894   Epoch: 1   Global Step: 13570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:11,456-Speed 3067.81 samples/sec   Loss 15.2648   LearningRate 0.0894   Epoch: 1   Global Step: 13580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:14,791-Speed 3071.78 samples/sec   Loss 15.3541   LearningRate 0.0894   Epoch: 1   Global Step: 13590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:18,120-Speed 3076.79 samples/sec   Loss 15.2768   LearningRate 0.0894   Epoch: 1   Global Step: 13600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:21,414-Speed 3109.76 samples/sec   Loss 15.3392   LearningRate 0.0893   Epoch: 1   Global Step: 13610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:24,736-Speed 3082.56 samples/sec   Loss 15.2089   LearningRate 0.0893   Epoch: 1   Global Step: 13620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:28,041-Speed 3099.84 samples/sec   Loss 15.2897   LearningRate 0.0893   Epoch: 1   Global Step: 13630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:31,370-Speed 3076.57 samples/sec   Loss 15.5311   LearningRate 0.0893   Epoch: 1   Global Step: 13640   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:34,651-Speed 3121.93 samples/sec   Loss 15.3070   LearningRate 0.0893   Epoch: 1   Global Step: 13650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:37,962-Speed 3093.86 samples/sec   Loss 15.4292   LearningRate 0.0893   Epoch: 1   Global Step: 13660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:41,280-Speed 3087.54 samples/sec   Loss 15.3525   LearningRate 0.0893   Epoch: 1   Global Step: 13670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:44,579-Speed 3105.11 samples/sec   Loss 15.3331   LearningRate 0.0893   Epoch: 1   Global Step: 13680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:47,885-Speed 3098.09 samples/sec   Loss 15.2476   LearningRate 0.0893   Epoch: 1   Global Step: 13690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:51,175-Speed 3113.21 samples/sec   Loss 15.3441   LearningRate 0.0893   Epoch: 1   Global Step: 13700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:54,518-Speed 3063.84 samples/sec   Loss 15.2595   LearningRate 0.0893   Epoch: 1   Global Step: 13710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:02:57,834-Speed 3089.34 samples/sec   Loss 15.1712   LearningRate 0.0893   Epoch: 1   Global Step: 13720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:01,239-Speed 3008.83 samples/sec   Loss 15.2991   LearningRate 0.0893   Epoch: 1   Global Step: 13730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:04,596-Speed 3051.73 samples/sec   Loss 15.5060   LearningRate 0.0892   Epoch: 1   Global Step: 13740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:07,894-Speed 3105.62 samples/sec   Loss 15.3321   LearningRate 0.0892   Epoch: 1   Global Step: 13750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:11,246-Speed 3056.17 samples/sec   Loss 15.2763   LearningRate 0.0892   Epoch: 1   Global Step: 13760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:14,544-Speed 3105.26 samples/sec   Loss 15.3188   LearningRate 0.0892   Epoch: 1   Global Step: 13770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:17,865-Speed 3084.96 samples/sec   Loss 15.3197   LearningRate 0.0892   Epoch: 1   Global Step: 13780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:21,152-Speed 3115.68 samples/sec   Loss 15.4541   LearningRate 0.0892   Epoch: 1   Global Step: 13790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:24,446-Speed 3109.78 samples/sec   Loss 15.2682   LearningRate 0.0892   Epoch: 1   Global Step: 13800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:27,710-Speed 3138.23 samples/sec   Loss 15.4309   LearningRate 0.0892   Epoch: 1   Global Step: 13810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:31,049-Speed 3068.07 samples/sec   Loss 15.4095   LearningRate 0.0892   Epoch: 1   Global Step: 13820   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 03:03:34,301-Speed 3149.29 samples/sec   Loss 15.3642   LearningRate 0.0892   Epoch: 1   Global Step: 13830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:37,647-Speed 3061.35 samples/sec   Loss 15.5513   LearningRate 0.0892   Epoch: 1   Global Step: 13840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:40,960-Speed 3091.89 samples/sec   Loss 15.4691   LearningRate 0.0892   Epoch: 1   Global Step: 13850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:44,284-Speed 3081.77 samples/sec   Loss 15.5627   LearningRate 0.0892   Epoch: 1   Global Step: 13860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:47,572-Speed 3115.12 samples/sec   Loss 15.5072   LearningRate 0.0891   Epoch: 1   Global Step: 13870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:50,893-Speed 3084.41 samples/sec   Loss 15.5746   LearningRate 0.0891   Epoch: 1   Global Step: 13880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:54,228-Speed 3071.26 samples/sec   Loss 15.3822   LearningRate 0.0891   Epoch: 1   Global Step: 13890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:03:57,601-Speed 3036.97 samples/sec   Loss 15.5090   LearningRate 0.0891   Epoch: 1   Global Step: 13900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:00,970-Speed 3040.14 samples/sec   Loss 15.5658   LearningRate 0.0891   Epoch: 1   Global Step: 13910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:04,261-Speed 3112.97 samples/sec   Loss 15.3146   LearningRate 0.0891   Epoch: 1   Global Step: 13920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:07,636-Speed 3034.77 samples/sec   Loss 15.3316   LearningRate 0.0891   Epoch: 1   Global Step: 13930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:10,941-Speed 3099.36 samples/sec   Loss 15.3651   LearningRate 0.0891   Epoch: 1   Global Step: 13940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:14,253-Speed 3092.26 samples/sec   Loss 15.4127   LearningRate 0.0891   Epoch: 1   Global Step: 13950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:17,608-Speed 3053.32 samples/sec   Loss 15.4631   LearningRate 0.0891   Epoch: 1   Global Step: 13960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:20,936-Speed 3077.21 samples/sec   Loss 15.2476   LearningRate 0.0891   Epoch: 1   Global Step: 13970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:24,304-Speed 3041.96 samples/sec   Loss 15.3557   LearningRate 0.0891   Epoch: 1   Global Step: 13980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:27,619-Speed 3089.56 samples/sec   Loss 15.3105   LearningRate 0.0891   Epoch: 1   Global Step: 13990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:30,910-Speed 3112.88 samples/sec   Loss 15.3819   LearningRate 0.0890   Epoch: 1   Global Step: 14000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:34,232-Speed 3082.63 samples/sec   Loss 15.4866   LearningRate 0.0890   Epoch: 1   Global Step: 14010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:37,588-Speed 3052.53 samples/sec   Loss 15.4351   LearningRate 0.0890   Epoch: 1   Global Step: 14020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:40,844-Speed 3145.74 samples/sec   Loss 15.3783   LearningRate 0.0890   Epoch: 1   Global Step: 14030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:44,131-Speed 3116.43 samples/sec   Loss 15.3146   LearningRate 0.0890   Epoch: 1   Global Step: 14040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:47,443-Speed 3092.85 samples/sec   Loss 15.4759   LearningRate 0.0890   Epoch: 1   Global Step: 14050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:50,760-Speed 3088.23 samples/sec   Loss 15.4416   LearningRate 0.0890   Epoch: 1   Global Step: 14060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:54,058-Speed 3105.51 samples/sec   Loss 15.5031   LearningRate 0.0890   Epoch: 1   Global Step: 14070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:04:57,359-Speed 3103.78 samples/sec   Loss 15.2425   LearningRate 0.0890   Epoch: 1   Global Step: 14080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:05:00,674-Speed 3089.02 samples/sec   Loss 15.3822   LearningRate 0.0890   Epoch: 1   Global Step: 14090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:05:04,000-Speed 3079.82 samples/sec   Loss 15.3637   LearningRate 0.0890   Epoch: 1   Global Step: 14100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:05:07,291-Speed 3113.47 samples/sec   Loss 15.4101   LearningRate 0.0890   Epoch: 1   Global Step: 14110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:05:10,577-Speed 3116.90 samples/sec   Loss 15.3795   LearningRate 0.0890   Epoch: 1   Global Step: 14120   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:05:13,876-Speed 3105.07 samples/sec   Loss 15.3480   LearningRate 0.0889   Epoch: 1   Global Step: 14130   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:05:17,198-Speed 3082.96 samples/sec   Loss 15.1855   LearningRate 0.0889   Epoch: 1   Global Step: 14140   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:05:20,490-Speed 3111.96 samples/sec   Loss 15.4294   LearningRate 0.0889   Epoch: 1   Global Step: 14150   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:05:23,771-Speed 3121.49 samples/sec   Loss 15.4681   LearningRate 0.0889   Epoch: 1   Global Step: 14160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:05:27,134-Speed 3046.64 samples/sec   Loss 15.4432   LearningRate 0.0889   Epoch: 1   Global Step: 14170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:05:30,437-Speed 3101.16 samples/sec   Loss 15.3589   LearningRate 0.0889   Epoch: 1   Global Step: 14180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:05:33,761-Speed 3082.38 samples/sec   Loss 15.4320   LearningRate 0.0889   Epoch: 1   Global Step: 14190   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:05:37,056-Speed 3108.70 samples/sec   Loss 15.4694   LearningRate 0.0889   Epoch: 1   Global Step: 14200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:05:40,417-Speed 3047.33 samples/sec   Loss 15.4563   LearningRate 0.0889   Epoch: 1   Global Step: 14210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:05:43,779-Speed 3046.54 samples/sec   Loss 15.5992   LearningRate 0.0889   Epoch: 1   Global Step: 14220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:05:47,159-Speed 3030.30 samples/sec   Loss 15.5474   LearningRate 0.0889   Epoch: 1   Global Step: 14230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:05:50,506-Speed 3060.59 samples/sec   Loss 15.4742   LearningRate 0.0889   Epoch: 1   Global Step: 14240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:05:53,840-Speed 3072.32 samples/sec   Loss 15.4804   LearningRate 0.0889   Epoch: 1   Global Step: 14250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:05:57,121-Speed 3121.45 samples/sec   Loss 15.3776   LearningRate 0.0888   Epoch: 1   Global Step: 14260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:06:00,484-Speed 3045.72 samples/sec   Loss 15.3304   LearningRate 0.0888   Epoch: 1   Global Step: 14270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:06:03,823-Speed 3067.77 samples/sec   Loss 15.5872   LearningRate 0.0888   Epoch: 1   Global Step: 14280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:06:07,174-Speed 3056.92 samples/sec   Loss 15.3847   LearningRate 0.0888   Epoch: 1   Global Step: 14290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:06:10,483-Speed 3095.99 samples/sec   Loss 15.3499   LearningRate 0.0888   Epoch: 1   Global Step: 14300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:06:13,789-Speed 3098.24 samples/sec   Loss 15.5344   LearningRate 0.0888   Epoch: 1   Global Step: 14310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:06:17,106-Speed 3088.63 samples/sec   Loss 15.4017   LearningRate 0.0888   Epoch: 1   Global Step: 14320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:06:20,460-Speed 3053.33 samples/sec   Loss 15.3367   LearningRate 0.0888   Epoch: 1   Global Step: 14330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:06:23,769-Speed 3096.57 samples/sec   Loss 15.2813   LearningRate 0.0888   Epoch: 1   Global Step: 14340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:06:27,079-Speed 3094.59 samples/sec   Loss 15.6328   LearningRate 0.0888   Epoch: 1   Global Step: 14350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:06:30,400-Speed 3083.65 samples/sec   Loss 15.4649   LearningRate 0.0888   Epoch: 1   Global Step: 14360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:06:33,680-Speed 3123.42 samples/sec   Loss 15.4253   LearningRate 0.0888   Epoch: 1   Global Step: 14370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:06:37,010-Speed 3075.61 samples/sec   Loss 15.3773   LearningRate 0.0888   Epoch: 1   Global Step: 14380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:06:40,372-Speed 3046.73 samples/sec   Loss 15.4404   LearningRate 0.0888   Epoch: 1   Global Step: 14390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:06:43,742-Speed 3039.79 samples/sec   Loss 15.3777   LearningRate 0.0887   Epoch: 1   Global Step: 14400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:06:47,083-Speed 3065.63 samples/sec   Loss 15.3095   LearningRate 0.0887   Epoch: 1   Global Step: 14410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:06:50,454-Speed 3038.85 samples/sec   Loss 15.6235   LearningRate 0.0887   Epoch: 1   Global Step: 14420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:06:53,820-Speed 3042.90 samples/sec   Loss 15.4072   LearningRate 0.0887   Epoch: 1   Global Step: 14430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:06:57,122-Speed 3102.59 samples/sec   Loss 15.5801   LearningRate 0.0887   Epoch: 1   Global Step: 14440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:07:00,458-Speed 3070.79 samples/sec   Loss 15.4274   LearningRate 0.0887   Epoch: 1   Global Step: 14450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:07:03,868-Speed 3003.63 samples/sec   Loss 15.4109   LearningRate 0.0887   Epoch: 1   Global Step: 14460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:07:07,183-Speed 3089.95 samples/sec   Loss 15.4790   LearningRate 0.0887   Epoch: 1   Global Step: 14470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:07:10,557-Speed 3035.90 samples/sec   Loss 15.3935   LearningRate 0.0887   Epoch: 1   Global Step: 14480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:07:13,942-Speed 3025.33 samples/sec   Loss 15.3084   LearningRate 0.0887   Epoch: 1   Global Step: 14490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:07:17,196-Speed 3148.26 samples/sec   Loss 15.4054   LearningRate 0.0887   Epoch: 1   Global Step: 14500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:07:20,540-Speed 3063.40 samples/sec   Loss 15.3545   LearningRate 0.0887   Epoch: 1   Global Step: 14510   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 03:07:23,814-Speed 3128.18 samples/sec   Loss 15.4571   LearningRate 0.0887   Epoch: 1   Global Step: 14520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:07:27,069-Speed 3148.38 samples/sec   Loss 15.6245   LearningRate 0.0886   Epoch: 1   Global Step: 14530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:07:30,416-Speed 3059.68 samples/sec   Loss 15.4505   LearningRate 0.0886   Epoch: 1   Global Step: 14540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:07:33,742-Speed 3079.96 samples/sec   Loss 15.4376   LearningRate 0.0886   Epoch: 1   Global Step: 14550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:07:37,093-Speed 3057.23 samples/sec   Loss 15.5399   LearningRate 0.0886   Epoch: 1   Global Step: 14560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:07:40,383-Speed 3113.20 samples/sec   Loss 15.4970   LearningRate 0.0886   Epoch: 1   Global Step: 14570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:07:43,731-Speed 3060.17 samples/sec   Loss 15.4645   LearningRate 0.0886   Epoch: 1   Global Step: 14580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:07:47,032-Speed 3102.32 samples/sec   Loss 15.4885   LearningRate 0.0886   Epoch: 1   Global Step: 14590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:07:50,402-Speed 3039.57 samples/sec   Loss 15.5237   LearningRate 0.0886   Epoch: 1   Global Step: 14600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:07:53,809-Speed 3006.72 samples/sec   Loss 15.4580   LearningRate 0.0886   Epoch: 1   Global Step: 14610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:07:57,192-Speed 3027.76 samples/sec   Loss 15.4035   LearningRate 0.0886   Epoch: 1   Global Step: 14620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:08:00,522-Speed 3076.43 samples/sec   Loss 15.3933   LearningRate 0.0886   Epoch: 1   Global Step: 14630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:08:03,955-Speed 2983.54 samples/sec   Loss 15.5051   LearningRate 0.0886   Epoch: 1   Global Step: 14640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:08:07,282-Speed 3079.10 samples/sec   Loss 15.4415   LearningRate 0.0886   Epoch: 1   Global Step: 14650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:08:10,643-Speed 3047.71 samples/sec   Loss 15.5421   LearningRate 0.0885   Epoch: 1   Global Step: 14660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:08:14,047-Speed 3008.97 samples/sec   Loss 15.5142   LearningRate 0.0885   Epoch: 1   Global Step: 14670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:08:17,359-Speed 3092.16 samples/sec   Loss 15.4905   LearningRate 0.0885   Epoch: 1   Global Step: 14680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:08:20,734-Speed 3034.97 samples/sec   Loss 15.3741   LearningRate 0.0885   Epoch: 1   Global Step: 14690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:08:24,147-Speed 3001.55 samples/sec   Loss 15.5480   LearningRate 0.0885   Epoch: 1   Global Step: 14700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:08:27,486-Speed 3069.54 samples/sec   Loss 15.4947   LearningRate 0.0885   Epoch: 1   Global Step: 14710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:08:30,791-Speed 3099.58 samples/sec   Loss 15.3588   LearningRate 0.0885   Epoch: 1   Global Step: 14720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:08:34,073-Speed 3120.99 samples/sec   Loss 15.4160   LearningRate 0.0885   Epoch: 1   Global Step: 14730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:08:37,436-Speed 3046.33 samples/sec   Loss 15.4301   LearningRate 0.0885   Epoch: 1   Global Step: 14740   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:08:40,744-Speed 3096.04 samples/sec   Loss 15.3052   LearningRate 0.0885   Epoch: 1   Global Step: 14750   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:08:44,049-Speed 3099.15 samples/sec   Loss 15.3785   LearningRate 0.0885   Epoch: 1   Global Step: 14760   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:08:47,340-Speed 3113.03 samples/sec   Loss 15.4523   LearningRate 0.0885   Epoch: 1   Global Step: 14770   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:08:50,689-Speed 3058.10 samples/sec   Loss 15.5006   LearningRate 0.0885   Epoch: 1   Global Step: 14780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:08:54,038-Speed 3059.01 samples/sec   Loss 15.5657   LearningRate 0.0884   Epoch: 1   Global Step: 14790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:08:57,360-Speed 3083.61 samples/sec   Loss 15.4867   LearningRate 0.0884   Epoch: 1   Global Step: 14800   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:09:00,729-Speed 3040.16 samples/sec   Loss 15.3469   LearningRate 0.0884   Epoch: 1   Global Step: 14810   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:09:04,074-Speed 3061.93 samples/sec   Loss 15.4431   LearningRate 0.0884   Epoch: 1   Global Step: 14820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:09:07,424-Speed 3058.37 samples/sec   Loss 15.5999   LearningRate 0.0884   Epoch: 1   Global Step: 14830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:09:10,753-Speed 3077.18 samples/sec   Loss 15.4332   LearningRate 0.0884   Epoch: 1   Global Step: 14840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:09:14,068-Speed 3090.54 samples/sec   Loss 15.4075   LearningRate 0.0884   Epoch: 1   Global Step: 14850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:09:17,421-Speed 3054.29 samples/sec   Loss 15.5630   LearningRate 0.0884   Epoch: 1   Global Step: 14860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:09:20,751-Speed 3076.03 samples/sec   Loss 15.4616   LearningRate 0.0884   Epoch: 1   Global Step: 14870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:09:24,083-Speed 3074.29 samples/sec   Loss 15.5662   LearningRate 0.0884   Epoch: 1   Global Step: 14880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:09:27,374-Speed 3113.01 samples/sec   Loss 15.2796   LearningRate 0.0884   Epoch: 1   Global Step: 14890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:09:30,664-Speed 3112.94 samples/sec   Loss 15.4683   LearningRate 0.0884   Epoch: 1   Global Step: 14900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:09:34,006-Speed 3064.86 samples/sec   Loss 15.4387   LearningRate 0.0884   Epoch: 1   Global Step: 14910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:09:37,431-Speed 2991.55 samples/sec   Loss 15.5216   LearningRate 0.0883   Epoch: 1   Global Step: 14920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:09:40,769-Speed 3068.45 samples/sec   Loss 15.4650   LearningRate 0.0883   Epoch: 1   Global Step: 14930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:09:44,109-Speed 3066.21 samples/sec   Loss 15.4037   LearningRate 0.0883   Epoch: 1   Global Step: 14940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:09:47,425-Speed 3089.75 samples/sec   Loss 15.4396   LearningRate 0.0883   Epoch: 1   Global Step: 14950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:09:50,742-Speed 3087.81 samples/sec   Loss 15.5564   LearningRate 0.0883   Epoch: 1   Global Step: 14960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:09:54,024-Speed 3121.26 samples/sec   Loss 15.3816   LearningRate 0.0883   Epoch: 1   Global Step: 14970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:09:57,357-Speed 3072.55 samples/sec   Loss 15.4978   LearningRate 0.0883   Epoch: 1   Global Step: 14980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:10:00,615-Speed 3144.63 samples/sec   Loss 15.3996   LearningRate 0.0883   Epoch: 1   Global Step: 14990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:10:03,909-Speed 3109.70 samples/sec   Loss 15.4200   LearningRate 0.0883   Epoch: 1   Global Step: 15000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:10:07,157-Speed 3153.35 samples/sec   Loss 15.3586   LearningRate 0.0883   Epoch: 1   Global Step: 15010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:10:10,471-Speed 3091.23 samples/sec   Loss 15.4132   LearningRate 0.0883   Epoch: 1   Global Step: 15020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:10:13,877-Speed 3007.46 samples/sec   Loss 15.4707   LearningRate 0.0883   Epoch: 1   Global Step: 15030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:10:17,209-Speed 3074.37 samples/sec   Loss 15.2978   LearningRate 0.0883   Epoch: 1   Global Step: 15040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:10:20,529-Speed 3085.25 samples/sec   Loss 15.3727   LearningRate 0.0883   Epoch: 1   Global Step: 15050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:10:23,846-Speed 3087.36 samples/sec   Loss 15.4553   LearningRate 0.0882   Epoch: 1   Global Step: 15060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:10:27,207-Speed 3047.48 samples/sec   Loss 15.6635   LearningRate 0.0882   Epoch: 1   Global Step: 15070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:10:30,564-Speed 3051.37 samples/sec   Loss 15.3493   LearningRate 0.0882   Epoch: 1   Global Step: 15080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:10:33,920-Speed 3052.57 samples/sec   Loss 15.4685   LearningRate 0.0882   Epoch: 1   Global Step: 15090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:10:37,292-Speed 3037.60 samples/sec   Loss 15.4378   LearningRate 0.0882   Epoch: 1   Global Step: 15100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:10:40,575-Speed 3120.11 samples/sec   Loss 15.3645   LearningRate 0.0882   Epoch: 1   Global Step: 15110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:10:44,013-Speed 2979.53 samples/sec   Loss 15.4916   LearningRate 0.0882   Epoch: 1   Global Step: 15120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:10:47,329-Speed 3088.48 samples/sec   Loss 15.2273   LearningRate 0.0882   Epoch: 1   Global Step: 15130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:10:50,649-Speed 3085.42 samples/sec   Loss 15.4280   LearningRate 0.0882   Epoch: 1   Global Step: 15140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:10:53,966-Speed 3088.14 samples/sec   Loss 15.4808   LearningRate 0.0882   Epoch: 1   Global Step: 15150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:10:57,261-Speed 3108.36 samples/sec   Loss 15.4966   LearningRate 0.0882   Epoch: 1   Global Step: 15160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:11:00,570-Speed 3095.56 samples/sec   Loss 15.4977   LearningRate 0.0882   Epoch: 1   Global Step: 15170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:11:03,927-Speed 3051.42 samples/sec   Loss 15.3355   LearningRate 0.0882   Epoch: 1   Global Step: 15180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:11:07,274-Speed 3060.72 samples/sec   Loss 15.3558   LearningRate 0.0881   Epoch: 1   Global Step: 15190   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:11:10,611-Speed 3069.34 samples/sec   Loss 15.5307   LearningRate 0.0881   Epoch: 1   Global Step: 15200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:11:13,999-Speed 3023.68 samples/sec   Loss 15.4670   LearningRate 0.0881   Epoch: 1   Global Step: 15210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:11:17,283-Speed 3118.22 samples/sec   Loss 15.4985   LearningRate 0.0881   Epoch: 1   Global Step: 15220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:11:20,588-Speed 3099.43 samples/sec   Loss 15.3697   LearningRate 0.0881   Epoch: 1   Global Step: 15230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:11:23,920-Speed 3074.53 samples/sec   Loss 15.4904   LearningRate 0.0881   Epoch: 1   Global Step: 15240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:11:27,218-Speed 3105.73 samples/sec   Loss 15.3399   LearningRate 0.0881   Epoch: 1   Global Step: 15250   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:11:30,489-Speed 3131.55 samples/sec   Loss 15.3573   LearningRate 0.0881   Epoch: 1   Global Step: 15260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:11:33,846-Speed 3050.82 samples/sec   Loss 15.3555   LearningRate 0.0881   Epoch: 1   Global Step: 15270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:11:37,181-Speed 3071.45 samples/sec   Loss 15.2663   LearningRate 0.0881   Epoch: 1   Global Step: 15280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:11:40,466-Speed 3118.07 samples/sec   Loss 15.3818   LearningRate 0.0881   Epoch: 1   Global Step: 15290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:11:43,821-Speed 3052.45 samples/sec   Loss 15.3902   LearningRate 0.0881   Epoch: 1   Global Step: 15300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:11:47,186-Speed 3045.04 samples/sec   Loss 15.5335   LearningRate 0.0881   Epoch: 1   Global Step: 15310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:11:50,539-Speed 3054.10 samples/sec   Loss 15.5366   LearningRate 0.0880   Epoch: 1   Global Step: 15320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:11:53,961-Speed 2993.87 samples/sec   Loss 15.4498   LearningRate 0.0880   Epoch: 1   Global Step: 15330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:11:57,354-Speed 3018.81 samples/sec   Loss 15.3666   LearningRate 0.0880   Epoch: 1   Global Step: 15340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:12:00,672-Speed 3087.07 samples/sec   Loss 15.4831   LearningRate 0.0880   Epoch: 1   Global Step: 15350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:12:04,042-Speed 3039.79 samples/sec   Loss 15.5878   LearningRate 0.0880   Epoch: 1   Global Step: 15360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:12:07,313-Speed 3131.15 samples/sec   Loss 15.3518   LearningRate 0.0880   Epoch: 1   Global Step: 15370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:12:10,630-Speed 3088.38 samples/sec   Loss 15.4739   LearningRate 0.0880   Epoch: 1   Global Step: 15380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:12:13,971-Speed 3065.47 samples/sec   Loss 15.6324   LearningRate 0.0880   Epoch: 1   Global Step: 15390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:12:17,344-Speed 3037.03 samples/sec   Loss 15.3855   LearningRate 0.0880   Epoch: 1   Global Step: 15400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:12:20,661-Speed 3088.65 samples/sec   Loss 15.4529   LearningRate 0.0880   Epoch: 1   Global Step: 15410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:12:24,034-Speed 3036.51 samples/sec   Loss 15.5002   LearningRate 0.0880   Epoch: 1   Global Step: 15420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:12:27,342-Speed 3096.46 samples/sec   Loss 15.2615   LearningRate 0.0880   Epoch: 1   Global Step: 15430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:12:30,699-Speed 3051.47 samples/sec   Loss 15.5660   LearningRate 0.0880   Epoch: 1   Global Step: 15440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:12:34,069-Speed 3039.30 samples/sec   Loss 15.5052   LearningRate 0.0879   Epoch: 1   Global Step: 15450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:12:37,343-Speed 3128.98 samples/sec   Loss 15.3555   LearningRate 0.0879   Epoch: 1   Global Step: 15460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:12:40,712-Speed 3040.27 samples/sec   Loss 15.3593   LearningRate 0.0879   Epoch: 1   Global Step: 15470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:12:44,065-Speed 3055.13 samples/sec   Loss 15.4546   LearningRate 0.0879   Epoch: 1   Global Step: 15480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:12:47,365-Speed 3104.93 samples/sec   Loss 15.4100   LearningRate 0.0879   Epoch: 1   Global Step: 15490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:12:50,664-Speed 3104.57 samples/sec   Loss 15.4764   LearningRate 0.0879   Epoch: 1   Global Step: 15500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:12:53,995-Speed 3074.71 samples/sec   Loss 15.5998   LearningRate 0.0879   Epoch: 1   Global Step: 15510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:12:57,304-Speed 3095.71 samples/sec   Loss 15.4129   LearningRate 0.0879   Epoch: 1   Global Step: 15520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:00,648-Speed 3063.58 samples/sec   Loss 15.4968   LearningRate 0.0879   Epoch: 1   Global Step: 15530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:03,996-Speed 3058.87 samples/sec   Loss 15.2806   LearningRate 0.0879   Epoch: 1   Global Step: 15540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:07,343-Speed 3060.82 samples/sec   Loss 15.4716   LearningRate 0.0879   Epoch: 1   Global Step: 15550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:10,668-Speed 3080.13 samples/sec   Loss 15.4055   LearningRate 0.0879   Epoch: 1   Global Step: 15560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:13,978-Speed 3094.50 samples/sec   Loss 15.4458   LearningRate 0.0879   Epoch: 1   Global Step: 15570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:17,259-Speed 3122.18 samples/sec   Loss 15.5451   LearningRate 0.0879   Epoch: 1   Global Step: 15580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:20,552-Speed 3110.47 samples/sec   Loss 15.4825   LearningRate 0.0878   Epoch: 1   Global Step: 15590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:23,861-Speed 3094.94 samples/sec   Loss 15.3750   LearningRate 0.0878   Epoch: 1   Global Step: 15600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:27,146-Speed 3118.39 samples/sec   Loss 15.5041   LearningRate 0.0878   Epoch: 1   Global Step: 15610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:30,478-Speed 3075.32 samples/sec   Loss 15.4263   LearningRate 0.0878   Epoch: 1   Global Step: 15620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:33,793-Speed 3090.51 samples/sec   Loss 15.4014   LearningRate 0.0878   Epoch: 1   Global Step: 15630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:37,074-Speed 3121.67 samples/sec   Loss 15.4556   LearningRate 0.0878   Epoch: 1   Global Step: 15640   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:40,345-Speed 3131.88 samples/sec   Loss 15.5355   LearningRate 0.0878   Epoch: 1   Global Step: 15650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:43,698-Speed 3054.43 samples/sec   Loss 15.3735   LearningRate 0.0878   Epoch: 1   Global Step: 15660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:47,071-Speed 3036.61 samples/sec   Loss 15.2502   LearningRate 0.0878   Epoch: 1   Global Step: 15670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:50,431-Speed 3049.36 samples/sec   Loss 15.4898   LearningRate 0.0878   Epoch: 1   Global Step: 15680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:53,725-Speed 3109.05 samples/sec   Loss 15.2022   LearningRate 0.0878   Epoch: 1   Global Step: 15690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:13:57,038-Speed 3092.34 samples/sec   Loss 15.3751   LearningRate 0.0878   Epoch: 1   Global Step: 15700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:00,388-Speed 3057.67 samples/sec   Loss 15.4367   LearningRate 0.0878   Epoch: 1   Global Step: 15710   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 03:14:03,630-Speed 3159.01 samples/sec   Loss 15.4990   LearningRate 0.0877   Epoch: 1   Global Step: 15720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:06,896-Speed 3136.55 samples/sec   Loss 15.2280   LearningRate 0.0877   Epoch: 1   Global Step: 15730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:10,234-Speed 3068.72 samples/sec   Loss 15.3373   LearningRate 0.0877   Epoch: 1   Global Step: 15740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:13,647-Speed 3001.52 samples/sec   Loss 15.1941   LearningRate 0.0877   Epoch: 1   Global Step: 15750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:16,960-Speed 3091.12 samples/sec   Loss 15.5083   LearningRate 0.0877   Epoch: 1   Global Step: 15760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:20,319-Speed 3049.05 samples/sec   Loss 15.1937   LearningRate 0.0877   Epoch: 1   Global Step: 15770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:23,672-Speed 3055.27 samples/sec   Loss 15.3264   LearningRate 0.0877   Epoch: 1   Global Step: 15780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:26,977-Speed 3099.66 samples/sec   Loss 15.2953   LearningRate 0.0877   Epoch: 1   Global Step: 15790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:30,243-Speed 3135.90 samples/sec   Loss 15.4505   LearningRate 0.0877   Epoch: 1   Global Step: 15800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:33,568-Speed 3081.07 samples/sec   Loss 15.3647   LearningRate 0.0877   Epoch: 1   Global Step: 15810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:36,892-Speed 3081.29 samples/sec   Loss 15.3563   LearningRate 0.0877   Epoch: 1   Global Step: 15820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:40,160-Speed 3133.83 samples/sec   Loss 15.1664   LearningRate 0.0877   Epoch: 1   Global Step: 15830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:43,494-Speed 3072.74 samples/sec   Loss 15.3219   LearningRate 0.0877   Epoch: 1   Global Step: 15840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:46,845-Speed 3056.62 samples/sec   Loss 15.4584   LearningRate 0.0876   Epoch: 1   Global Step: 15850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:50,174-Speed 3076.71 samples/sec   Loss 15.4009   LearningRate 0.0876   Epoch: 1   Global Step: 15860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:53,502-Speed 3078.47 samples/sec   Loss 15.2413   LearningRate 0.0876   Epoch: 1   Global Step: 15870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:14:56,847-Speed 3061.41 samples/sec   Loss 15.4570   LearningRate 0.0876   Epoch: 1   Global Step: 15880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:15:00,148-Speed 3103.89 samples/sec   Loss 15.2431   LearningRate 0.0876   Epoch: 1   Global Step: 15890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:15:03,484-Speed 3070.68 samples/sec   Loss 15.3481   LearningRate 0.0876   Epoch: 1   Global Step: 15900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:15:06,747-Speed 3138.97 samples/sec   Loss 15.6051   LearningRate 0.0876   Epoch: 1   Global Step: 15910   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:15:10,022-Speed 3127.24 samples/sec   Loss 15.4132   LearningRate 0.0876   Epoch: 1   Global Step: 15920   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:15:13,335-Speed 3092.68 samples/sec   Loss 15.3628   LearningRate 0.0876   Epoch: 1   Global Step: 15930   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:15:16,752-Speed 2996.75 samples/sec   Loss 15.3585   LearningRate 0.0876   Epoch: 1   Global Step: 15940   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:15:20,078-Speed 3080.13 samples/sec   Loss 15.3177   LearningRate 0.0876   Epoch: 1   Global Step: 15950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:15:23,419-Speed 3066.14 samples/sec   Loss 15.4515   LearningRate 0.0876   Epoch: 1   Global Step: 15960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:15:26,739-Speed 3084.82 samples/sec   Loss 15.4290   LearningRate 0.0876   Epoch: 1   Global Step: 15970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:15:30,002-Speed 3140.06 samples/sec   Loss 15.3685   LearningRate 0.0875   Epoch: 1   Global Step: 15980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:15:33,265-Speed 3139.01 samples/sec   Loss 15.4175   LearningRate 0.0875   Epoch: 1   Global Step: 15990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:15:36,547-Speed 3120.77 samples/sec   Loss 15.3907   LearningRate 0.0875   Epoch: 1   Global Step: 16000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:15:39,842-Speed 3108.82 samples/sec   Loss 15.2318   LearningRate 0.0875   Epoch: 1   Global Step: 16010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:15:43,100-Speed 3144.02 samples/sec   Loss 15.4525   LearningRate 0.0875   Epoch: 1   Global Step: 16020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:15:46,417-Speed 3087.94 samples/sec   Loss 15.4053   LearningRate 0.0875   Epoch: 1   Global Step: 16030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:15:49,715-Speed 3106.15 samples/sec   Loss 15.4783   LearningRate 0.0875   Epoch: 1   Global Step: 16040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:15:53,051-Speed 3071.02 samples/sec   Loss 15.2993   LearningRate 0.0875   Epoch: 1   Global Step: 16050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:15:56,332-Speed 3122.43 samples/sec   Loss 15.5131   LearningRate 0.0875   Epoch: 1   Global Step: 16060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:15:59,722-Speed 3020.99 samples/sec   Loss 15.3153   LearningRate 0.0875   Epoch: 1   Global Step: 16070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:03,024-Speed 3102.94 samples/sec   Loss 15.3797   LearningRate 0.0875   Epoch: 1   Global Step: 16080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:06,288-Speed 3137.99 samples/sec   Loss 15.3157   LearningRate 0.0875   Epoch: 1   Global Step: 16090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:09,643-Speed 3052.79 samples/sec   Loss 15.3575   LearningRate 0.0875   Epoch: 1   Global Step: 16100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:12,930-Speed 3117.07 samples/sec   Loss 15.3006   LearningRate 0.0875   Epoch: 1   Global Step: 16110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:16,261-Speed 3074.68 samples/sec   Loss 15.3374   LearningRate 0.0874   Epoch: 1   Global Step: 16120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:19,535-Speed 3129.43 samples/sec   Loss 15.2544   LearningRate 0.0874   Epoch: 1   Global Step: 16130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:22,794-Speed 3143.21 samples/sec   Loss 15.3134   LearningRate 0.0874   Epoch: 1   Global Step: 16140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:26,157-Speed 3045.18 samples/sec   Loss 15.4423   LearningRate 0.0874   Epoch: 1   Global Step: 16150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:29,514-Speed 3051.22 samples/sec   Loss 15.2315   LearningRate 0.0874   Epoch: 1   Global Step: 16160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:32,834-Speed 3086.17 samples/sec   Loss 15.1249   LearningRate 0.0874   Epoch: 1   Global Step: 16170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:36,133-Speed 3104.25 samples/sec   Loss 15.5116   LearningRate 0.0874   Epoch: 1   Global Step: 16180   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:39,404-Speed 3131.31 samples/sec   Loss 15.0256   LearningRate 0.0874   Epoch: 1   Global Step: 16190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:42,736-Speed 3074.80 samples/sec   Loss 15.3360   LearningRate 0.0874   Epoch: 1   Global Step: 16200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:46,046-Speed 3094.57 samples/sec   Loss 15.3551   LearningRate 0.0874   Epoch: 1   Global Step: 16210   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:49,392-Speed 3061.36 samples/sec   Loss 15.3842   LearningRate 0.0874   Epoch: 1   Global Step: 16220   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:52,656-Speed 3137.53 samples/sec   Loss 15.2977   LearningRate 0.0874   Epoch: 1   Global Step: 16230   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:55,993-Speed 3069.99 samples/sec   Loss 15.3538   LearningRate 0.0874   Epoch: 1   Global Step: 16240   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:16:59,343-Speed 3058.30 samples/sec   Loss 15.6020   LearningRate 0.0873   Epoch: 1   Global Step: 16250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:17:02,656-Speed 3093.48 samples/sec   Loss 15.3879   LearningRate 0.0873   Epoch: 1   Global Step: 16260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:17:06,023-Speed 3041.85 samples/sec   Loss 15.5560   LearningRate 0.0873   Epoch: 1   Global Step: 16270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:17:09,325-Speed 3101.82 samples/sec   Loss 15.3597   LearningRate 0.0873   Epoch: 1   Global Step: 16280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:17:12,630-Speed 3099.48 samples/sec   Loss 15.2653   LearningRate 0.0873   Epoch: 1   Global Step: 16290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:17:15,919-Speed 3114.20 samples/sec   Loss 15.2252   LearningRate 0.0873   Epoch: 1   Global Step: 16300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:17:19,203-Speed 3120.39 samples/sec   Loss 15.2944   LearningRate 0.0873   Epoch: 1   Global Step: 16310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:17:22,520-Speed 3087.65 samples/sec   Loss 15.2865   LearningRate 0.0873   Epoch: 1   Global Step: 16320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:17:25,895-Speed 3035.29 samples/sec   Loss 15.3299   LearningRate 0.0873   Epoch: 1   Global Step: 16330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:17:29,203-Speed 3096.57 samples/sec   Loss 15.3196   LearningRate 0.0873   Epoch: 1   Global Step: 16340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:17:32,490-Speed 3116.64 samples/sec   Loss 15.4667   LearningRate 0.0873   Epoch: 1   Global Step: 16350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:17:35,770-Speed 3122.86 samples/sec   Loss 15.3270   LearningRate 0.0873   Epoch: 1   Global Step: 16360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:17:39,072-Speed 3102.10 samples/sec   Loss 15.2819   LearningRate 0.0873   Epoch: 1   Global Step: 16370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:17:42,414-Speed 3064.73 samples/sec   Loss 15.3246   LearningRate 0.0872   Epoch: 1   Global Step: 16380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:17:45,709-Speed 3109.06 samples/sec   Loss 15.3632   LearningRate 0.0872   Epoch: 1   Global Step: 16390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:17:49,035-Speed 3079.19 samples/sec   Loss 15.1822   LearningRate 0.0872   Epoch: 1   Global Step: 16400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:17:52,381-Speed 3061.73 samples/sec   Loss 15.0581   LearningRate 0.0872   Epoch: 1   Global Step: 16410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:17:55,708-Speed 3078.28 samples/sec   Loss 15.2622   LearningRate 0.0872   Epoch: 1   Global Step: 16420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:17:59,041-Speed 3073.85 samples/sec   Loss 15.3462   LearningRate 0.0872   Epoch: 1   Global Step: 16430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:02,366-Speed 3080.71 samples/sec   Loss 15.3469   LearningRate 0.0872   Epoch: 1   Global Step: 16440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:05,698-Speed 3073.58 samples/sec   Loss 15.2015   LearningRate 0.0872   Epoch: 1   Global Step: 16450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:09,051-Speed 3054.94 samples/sec   Loss 15.1934   LearningRate 0.0872   Epoch: 1   Global Step: 16460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:12,358-Speed 3097.53 samples/sec   Loss 15.3299   LearningRate 0.0872   Epoch: 1   Global Step: 16470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:15,656-Speed 3105.61 samples/sec   Loss 15.2421   LearningRate 0.0872   Epoch: 1   Global Step: 16480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:18,956-Speed 3104.12 samples/sec   Loss 15.1340   LearningRate 0.0872   Epoch: 1   Global Step: 16490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:22,373-Speed 2997.71 samples/sec   Loss 15.2506   LearningRate 0.0872   Epoch: 1   Global Step: 16500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:25,689-Speed 3087.83 samples/sec   Loss 15.3546   LearningRate 0.0871   Epoch: 1   Global Step: 16510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:29,030-Speed 3066.61 samples/sec   Loss 15.1252   LearningRate 0.0871   Epoch: 1   Global Step: 16520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:32,329-Speed 3104.23 samples/sec   Loss 15.3484   LearningRate 0.0871   Epoch: 1   Global Step: 16530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:35,674-Speed 3062.90 samples/sec   Loss 15.2525   LearningRate 0.0871   Epoch: 1   Global Step: 16540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:38,974-Speed 3103.04 samples/sec   Loss 15.2216   LearningRate 0.0871   Epoch: 1   Global Step: 16550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:42,346-Speed 3038.32 samples/sec   Loss 15.3614   LearningRate 0.0871   Epoch: 1   Global Step: 16560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:45,683-Speed 3069.79 samples/sec   Loss 15.2647   LearningRate 0.0871   Epoch: 1   Global Step: 16570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:49,032-Speed 3057.84 samples/sec   Loss 15.1958   LearningRate 0.0871   Epoch: 1   Global Step: 16580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:52,371-Speed 3067.99 samples/sec   Loss 15.1609   LearningRate 0.0871   Epoch: 1   Global Step: 16590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:55,657-Speed 3117.97 samples/sec   Loss 15.2466   LearningRate 0.0871   Epoch: 1   Global Step: 16600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:18:58,965-Speed 3096.19 samples/sec   Loss 15.2139   LearningRate 0.0871   Epoch: 1   Global Step: 16610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:19:02,274-Speed 3095.22 samples/sec   Loss 15.2725   LearningRate 0.0871   Epoch: 1   Global Step: 16620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:19:05,607-Speed 3073.57 samples/sec   Loss 15.2488   LearningRate 0.0871   Epoch: 1   Global Step: 16630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:19:08,911-Speed 3099.66 samples/sec   Loss 14.9688   LearningRate 0.0871   Epoch: 1   Global Step: 16640   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:19:12,276-Speed 3044.14 samples/sec   Loss 15.2464   LearningRate 0.0870   Epoch: 1   Global Step: 16650   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:19:15,625-Speed 3058.07 samples/sec   Loss 15.2217   LearningRate 0.0870   Epoch: 1   Global Step: 16660   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:19:18,951-Speed 3079.63 samples/sec   Loss 15.2078   LearningRate 0.0870   Epoch: 1   Global Step: 16670   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:19:22,307-Speed 3052.65 samples/sec   Loss 15.2525   LearningRate 0.0870   Epoch: 1   Global Step: 16680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:19:25,714-Speed 3006.88 samples/sec   Loss 15.3526   LearningRate 0.0870   Epoch: 1   Global Step: 16690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:19:29,014-Speed 3103.53 samples/sec   Loss 15.3918   LearningRate 0.0870   Epoch: 1   Global Step: 16700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:19:32,305-Speed 3111.78 samples/sec   Loss 15.1994   LearningRate 0.0870   Epoch: 1   Global Step: 16710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:19:35,658-Speed 3054.91 samples/sec   Loss 15.3343   LearningRate 0.0870   Epoch: 1   Global Step: 16720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:19:38,994-Speed 3070.46 samples/sec   Loss 15.1545   LearningRate 0.0870   Epoch: 1   Global Step: 16730   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:19:42,340-Speed 3061.47 samples/sec   Loss 15.1522   LearningRate 0.0870   Epoch: 1   Global Step: 16740   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:19:45,657-Speed 3087.96 samples/sec   Loss 15.1580   LearningRate 0.0870   Epoch: 1   Global Step: 16750   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:19:48,913-Speed 3146.14 samples/sec   Loss 15.0609   LearningRate 0.0870   Epoch: 1   Global Step: 16760   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:19:52,194-Speed 3121.16 samples/sec   Loss 15.3560   LearningRate 0.0870   Epoch: 1   Global Step: 16770   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:19:55,480-Speed 3117.86 samples/sec   Loss 15.3299   LearningRate 0.0869   Epoch: 1   Global Step: 16780   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:19:58,761-Speed 3122.21 samples/sec   Loss 15.3381   LearningRate 0.0869   Epoch: 1   Global Step: 16790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:20:02,095-Speed 3071.33 samples/sec   Loss 15.1473   LearningRate 0.0869   Epoch: 1   Global Step: 16800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:20:05,397-Speed 3102.16 samples/sec   Loss 15.0855   LearningRate 0.0869   Epoch: 1   Global Step: 16810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:20:08,666-Speed 3133.36 samples/sec   Loss 15.3119   LearningRate 0.0869   Epoch: 1   Global Step: 16820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:20:11,959-Speed 3110.80 samples/sec   Loss 15.2914   LearningRate 0.0869   Epoch: 1   Global Step: 16830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:20:15,336-Speed 3032.91 samples/sec   Loss 15.2223   LearningRate 0.0869   Epoch: 1   Global Step: 16840   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:20:18,690-Speed 3054.23 samples/sec   Loss 15.3601   LearningRate 0.0869   Epoch: 1   Global Step: 16850   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:20:22,014-Speed 3081.62 samples/sec   Loss 15.1712   LearningRate 0.0869   Epoch: 1   Global Step: 16860   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:20:25,330-Speed 3088.30 samples/sec   Loss 15.4292   LearningRate 0.0869   Epoch: 1   Global Step: 16870   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:20:28,585-Speed 3147.77 samples/sec   Loss 15.2004   LearningRate 0.0869   Epoch: 1   Global Step: 16880   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:20:31,920-Speed 3070.46 samples/sec   Loss 15.3529   LearningRate 0.0869   Epoch: 1   Global Step: 16890   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:20:35,171-Speed 3151.18 samples/sec   Loss 15.3008   LearningRate 0.0869   Epoch: 1   Global Step: 16900   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:20:38,487-Speed 3089.15 samples/sec   Loss 15.2541   LearningRate 0.0868   Epoch: 1   Global Step: 16910   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:20:41,787-Speed 3103.90 samples/sec   Loss 15.3058   LearningRate 0.0868   Epoch: 1   Global Step: 16920   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:20:45,037-Speed 3151.92 samples/sec   Loss 14.9788   LearningRate 0.0868   Epoch: 1   Global Step: 16930   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:20:48,303-Speed 3135.51 samples/sec   Loss 15.3806   LearningRate 0.0868   Epoch: 1   Global Step: 16940   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:20:51,620-Speed 3088.92 samples/sec   Loss 15.2195   LearningRate 0.0868   Epoch: 1   Global Step: 16950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:20:54,950-Speed 3075.42 samples/sec   Loss 15.3706   LearningRate 0.0868   Epoch: 1   Global Step: 16960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:20:58,283-Speed 3073.43 samples/sec   Loss 15.1653   LearningRate 0.0868   Epoch: 1   Global Step: 16970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:21:01,552-Speed 3133.76 samples/sec   Loss 15.1028   LearningRate 0.0868   Epoch: 1   Global Step: 16980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:21:04,839-Speed 3115.88 samples/sec   Loss 15.2534   LearningRate 0.0868   Epoch: 1   Global Step: 16990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:21:08,246-Speed 3007.21 samples/sec   Loss 15.1494   LearningRate 0.0868   Epoch: 1   Global Step: 17000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:21:11,629-Speed 3027.47 samples/sec   Loss 15.2051   LearningRate 0.0868   Epoch: 1   Global Step: 17010   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:21:14,944-Speed 3089.62 samples/sec   Loss 15.2018   LearningRate 0.0868   Epoch: 1   Global Step: 17020   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:21:18,284-Speed 3066.66 samples/sec   Loss 15.3767   LearningRate 0.0868   Epoch: 1   Global Step: 17030   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:21:21,598-Speed 3091.48 samples/sec   Loss 15.1443   LearningRate 0.0868   Epoch: 1   Global Step: 17040   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:21:24,920-Speed 3082.71 samples/sec   Loss 15.2529   LearningRate 0.0867   Epoch: 1   Global Step: 17050   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:21:28,262-Speed 3065.40 samples/sec   Loss 15.0507   LearningRate 0.0867   Epoch: 1   Global Step: 17060   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:21:31,579-Speed 3088.55 samples/sec   Loss 15.1389   LearningRate 0.0867   Epoch: 1   Global Step: 17070   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:21:34,978-Speed 3013.42 samples/sec   Loss 15.2000   LearningRate 0.0867   Epoch: 1   Global Step: 17080   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:21:38,255-Speed 3125.38 samples/sec   Loss 15.1968   LearningRate 0.0867   Epoch: 1   Global Step: 17090   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:21:41,535-Speed 3122.48 samples/sec   Loss 15.0159   LearningRate 0.0867   Epoch: 1   Global Step: 17100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:21:44,777-Speed 3160.20 samples/sec   Loss 15.3396   LearningRate 0.0867   Epoch: 1   Global Step: 17110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:21:48,096-Speed 3085.76 samples/sec   Loss 15.2519   LearningRate 0.0867   Epoch: 1   Global Step: 17120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:21:51,438-Speed 3064.86 samples/sec   Loss 15.3089   LearningRate 0.0867   Epoch: 1   Global Step: 17130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:21:54,697-Speed 3142.98 samples/sec   Loss 15.1090   LearningRate 0.0867   Epoch: 1   Global Step: 17140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:21:57,991-Speed 3109.22 samples/sec   Loss 15.2090   LearningRate 0.0867   Epoch: 1   Global Step: 17150   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:22:01,329-Speed 3069.03 samples/sec   Loss 15.1798   LearningRate 0.0867   Epoch: 1   Global Step: 17160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:22:04,610-Speed 3121.99 samples/sec   Loss 15.2514   LearningRate 0.0867   Epoch: 1   Global Step: 17170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:22:07,926-Speed 3088.26 samples/sec   Loss 15.1428   LearningRate 0.0866   Epoch: 1   Global Step: 17180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:22:11,328-Speed 3011.41 samples/sec   Loss 15.2615   LearningRate 0.0866   Epoch: 1   Global Step: 17190   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:22:14,694-Speed 3043.19 samples/sec   Loss 15.0915   LearningRate 0.0866   Epoch: 1   Global Step: 17200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:22:18,035-Speed 3065.92 samples/sec   Loss 15.2755   LearningRate 0.0866   Epoch: 1   Global Step: 17210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:22:21,347-Speed 3092.97 samples/sec   Loss 15.2038   LearningRate 0.0866   Epoch: 1   Global Step: 17220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:22:24,642-Speed 3108.83 samples/sec   Loss 15.2295   LearningRate 0.0866   Epoch: 1   Global Step: 17230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:22:27,898-Speed 3145.63 samples/sec   Loss 15.1331   LearningRate 0.0866   Epoch: 1   Global Step: 17240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:22:31,197-Speed 3104.60 samples/sec   Loss 15.1442   LearningRate 0.0866   Epoch: 1   Global Step: 17250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:22:34,542-Speed 3062.73 samples/sec   Loss 15.3272   LearningRate 0.0866   Epoch: 1   Global Step: 17260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:22:37,809-Speed 3134.34 samples/sec   Loss 15.2063   LearningRate 0.0866   Epoch: 1   Global Step: 17270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:22:41,130-Speed 3090.13 samples/sec   Loss 15.1389   LearningRate 0.0866   Epoch: 1   Global Step: 17280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:22:44,511-Speed 3029.76 samples/sec   Loss 15.1556   LearningRate 0.0866   Epoch: 1   Global Step: 17290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:22:47,826-Speed 3090.01 samples/sec   Loss 15.3561   LearningRate 0.0866   Epoch: 1   Global Step: 17300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:22:51,141-Speed 3089.76 samples/sec   Loss 15.2250   LearningRate 0.0865   Epoch: 1   Global Step: 17310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:22:54,439-Speed 3105.74 samples/sec   Loss 15.1090   LearningRate 0.0865   Epoch: 1   Global Step: 17320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:22:57,871-Speed 2983.88 samples/sec   Loss 15.1134   LearningRate 0.0865   Epoch: 1   Global Step: 17330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:01,288-Speed 2998.36 samples/sec   Loss 15.1344   LearningRate 0.0865   Epoch: 1   Global Step: 17340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:04,709-Speed 2993.40 samples/sec   Loss 15.2363   LearningRate 0.0865   Epoch: 1   Global Step: 17350   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 03:23:08,014-Speed 3099.81 samples/sec   Loss 15.1752   LearningRate 0.0865   Epoch: 1   Global Step: 17360   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:11,366-Speed 3055.86 samples/sec   Loss 15.2173   LearningRate 0.0865   Epoch: 1   Global Step: 17370   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:14,685-Speed 3086.14 samples/sec   Loss 15.0116   LearningRate 0.0865   Epoch: 1   Global Step: 17380   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:17,983-Speed 3105.94 samples/sec   Loss 15.1681   LearningRate 0.0865   Epoch: 1   Global Step: 17390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:21,315-Speed 3074.16 samples/sec   Loss 15.0919   LearningRate 0.0865   Epoch: 1   Global Step: 17400   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:24,664-Speed 3058.18 samples/sec   Loss 15.1449   LearningRate 0.0865   Epoch: 1   Global Step: 17410   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:28,008-Speed 3063.33 samples/sec   Loss 15.0936   LearningRate 0.0865   Epoch: 1   Global Step: 17420   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:31,390-Speed 3028.43 samples/sec   Loss 15.1762   LearningRate 0.0865   Epoch: 1   Global Step: 17430   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:34,738-Speed 3060.02 samples/sec   Loss 15.3984   LearningRate 0.0865   Epoch: 1   Global Step: 17440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:38,014-Speed 3126.86 samples/sec   Loss 15.2022   LearningRate 0.0864   Epoch: 1   Global Step: 17450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:41,348-Speed 3071.69 samples/sec   Loss 15.1178   LearningRate 0.0864   Epoch: 1   Global Step: 17460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:44,654-Speed 3098.52 samples/sec   Loss 15.1743   LearningRate 0.0864   Epoch: 1   Global Step: 17470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:48,060-Speed 3007.28 samples/sec   Loss 14.9733   LearningRate 0.0864   Epoch: 1   Global Step: 17480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:51,477-Speed 2998.40 samples/sec   Loss 15.2495   LearningRate 0.0864   Epoch: 1   Global Step: 17490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:54,810-Speed 3072.84 samples/sec   Loss 15.1854   LearningRate 0.0864   Epoch: 1   Global Step: 17500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:23:58,176-Speed 3043.14 samples/sec   Loss 15.1956   LearningRate 0.0864   Epoch: 1   Global Step: 17510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:24:01,483-Speed 3097.62 samples/sec   Loss 15.2704   LearningRate 0.0864   Epoch: 1   Global Step: 17520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:24:04,760-Speed 3125.59 samples/sec   Loss 15.2272   LearningRate 0.0864   Epoch: 1   Global Step: 17530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:24:08,161-Speed 3011.88 samples/sec   Loss 15.2319   LearningRate 0.0864   Epoch: 1   Global Step: 17540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:24:11,601-Speed 2976.76 samples/sec   Loss 15.0704   LearningRate 0.0864   Epoch: 1   Global Step: 17550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:24:14,914-Speed 3092.22 samples/sec   Loss 15.2791   LearningRate 0.0864   Epoch: 1   Global Step: 17560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:24:18,276-Speed 3046.40 samples/sec   Loss 15.0863   LearningRate 0.0864   Epoch: 1   Global Step: 17570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:24:21,588-Speed 3093.88 samples/sec   Loss 15.1523   LearningRate 0.0863   Epoch: 1   Global Step: 17580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:24:24,897-Speed 3095.27 samples/sec   Loss 15.2778   LearningRate 0.0863   Epoch: 1   Global Step: 17590   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:24:28,246-Speed 3058.43 samples/sec   Loss 15.0739   LearningRate 0.0863   Epoch: 1   Global Step: 17600   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:24:31,616-Speed 3039.69 samples/sec   Loss 15.1572   LearningRate 0.0863   Epoch: 1   Global Step: 17610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:24:34,962-Speed 3060.32 samples/sec   Loss 15.0546   LearningRate 0.0863   Epoch: 1   Global Step: 17620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:24:38,266-Speed 3100.86 samples/sec   Loss 15.1887   LearningRate 0.0863   Epoch: 1   Global Step: 17630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:24:41,550-Speed 3118.82 samples/sec   Loss 14.9595   LearningRate 0.0863   Epoch: 1   Global Step: 17640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:24:44,861-Speed 3093.50 samples/sec   Loss 15.1884   LearningRate 0.0863   Epoch: 1   Global Step: 17650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:24:48,144-Speed 3120.06 samples/sec   Loss 15.2785   LearningRate 0.0863   Epoch: 1   Global Step: 17660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:24:51,454-Speed 3094.35 samples/sec   Loss 15.0099   LearningRate 0.0863   Epoch: 1   Global Step: 17670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 03:24:54,770-Speed 3089.35 samples/sec   Loss 14.9926   LearningRate 0.0863   Epoch: 1   Global Step: 17680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:24:58,060-Speed 3114.22 samples/sec   Loss 15.0618   LearningRate 0.0863   Epoch: 1   Global Step: 17690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:25:01,352-Speed 3111.53 samples/sec   Loss 15.1348   LearningRate 0.0863   Epoch: 1   Global Step: 17700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:25:04,692-Speed 3066.63 samples/sec   Loss 14.9178   LearningRate 0.0863   Epoch: 1   Global Step: 17710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:25:08,017-Speed 3080.25 samples/sec   Loss 14.9843   LearningRate 0.0862   Epoch: 1   Global Step: 17720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:25:11,332-Speed 3089.88 samples/sec   Loss 15.2903   LearningRate 0.0862   Epoch: 1   Global Step: 17730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:25:14,637-Speed 3099.26 samples/sec   Loss 15.1036   LearningRate 0.0862   Epoch: 1   Global Step: 17740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:25:17,915-Speed 3124.73 samples/sec   Loss 15.0109   LearningRate 0.0862   Epoch: 1   Global Step: 17750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:25:21,256-Speed 3065.93 samples/sec   Loss 15.0815   LearningRate 0.0862   Epoch: 1   Global Step: 17760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:25:24,579-Speed 3082.17 samples/sec   Loss 15.2312   LearningRate 0.0862   Epoch: 1   Global Step: 17770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:25:27,927-Speed 3059.76 samples/sec   Loss 15.1871   LearningRate 0.0862   Epoch: 1   Global Step: 17780   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 03:25:31,200-Speed 3129.10 samples/sec   Loss 15.2028   LearningRate 0.0862   Epoch: 1   Global Step: 17790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:25:34,493-Speed 3110.98 samples/sec   Loss 15.3953   LearningRate 0.0862   Epoch: 1   Global Step: 17800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:25:37,745-Speed 3149.50 samples/sec   Loss 15.0557   LearningRate 0.0862   Epoch: 1   Global Step: 17810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:25:41,032-Speed 3117.50 samples/sec   Loss 14.8698   LearningRate 0.0862   Epoch: 1   Global Step: 17820   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:25:44,321-Speed 3114.29 samples/sec   Loss 15.0983   LearningRate 0.0862   Epoch: 1   Global Step: 17830   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 03:25:47,616-Speed 3108.76 samples/sec   Loss 14.9814   LearningRate 0.0862   Epoch: 1   Global Step: 17840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:25:50,980-Speed 3044.72 samples/sec   Loss 15.1747   LearningRate 0.0861   Epoch: 1   Global Step: 17850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:25:54,305-Speed 3081.33 samples/sec   Loss 15.1243   LearningRate 0.0861   Epoch: 1   Global Step: 17860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:25:57,623-Speed 3089.12 samples/sec   Loss 15.0665   LearningRate 0.0861   Epoch: 1   Global Step: 17870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:00,951-Speed 3078.18 samples/sec   Loss 15.0478   LearningRate 0.0861   Epoch: 1   Global Step: 17880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:04,288-Speed 3068.94 samples/sec   Loss 15.0583   LearningRate 0.0861   Epoch: 1   Global Step: 17890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:07,645-Speed 3050.92 samples/sec   Loss 15.1290   LearningRate 0.0861   Epoch: 1   Global Step: 17900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:11,049-Speed 3009.78 samples/sec   Loss 15.1210   LearningRate 0.0861   Epoch: 1   Global Step: 17910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:14,398-Speed 3057.95 samples/sec   Loss 14.9979   LearningRate 0.0861   Epoch: 1   Global Step: 17920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:17,765-Speed 3042.15 samples/sec   Loss 15.0156   LearningRate 0.0861   Epoch: 1   Global Step: 17930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:21,078-Speed 3092.27 samples/sec   Loss 15.1855   LearningRate 0.0861   Epoch: 1   Global Step: 17940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:24,382-Speed 3100.37 samples/sec   Loss 15.0410   LearningRate 0.0861   Epoch: 1   Global Step: 17950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:27,726-Speed 3062.79 samples/sec   Loss 14.9936   LearningRate 0.0861   Epoch: 1   Global Step: 17960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:31,040-Speed 3090.87 samples/sec   Loss 15.0221   LearningRate 0.0861   Epoch: 1   Global Step: 17970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:34,409-Speed 3040.61 samples/sec   Loss 15.1371   LearningRate 0.0860   Epoch: 1   Global Step: 17980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:37,676-Speed 3135.11 samples/sec   Loss 14.9001   LearningRate 0.0860   Epoch: 1   Global Step: 17990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:41,020-Speed 3063.25 samples/sec   Loss 15.0990   LearningRate 0.0860   Epoch: 1   Global Step: 18000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:44,384-Speed 3044.52 samples/sec   Loss 15.1012   LearningRate 0.0860   Epoch: 1   Global Step: 18010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:47,717-Speed 3073.30 samples/sec   Loss 14.9493   LearningRate 0.0860   Epoch: 1   Global Step: 18020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:51,112-Speed 3017.21 samples/sec   Loss 15.0398   LearningRate 0.0860   Epoch: 1   Global Step: 18030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:54,470-Speed 3050.20 samples/sec   Loss 15.0873   LearningRate 0.0860   Epoch: 1   Global Step: 18040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:26:57,766-Speed 3107.60 samples/sec   Loss 15.0193   LearningRate 0.0860   Epoch: 1   Global Step: 18050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:01,122-Speed 3052.89 samples/sec   Loss 15.0197   LearningRate 0.0860   Epoch: 1   Global Step: 18060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:04,460-Speed 3068.42 samples/sec   Loss 14.9366   LearningRate 0.0860   Epoch: 1   Global Step: 18070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:07,762-Speed 3101.85 samples/sec   Loss 15.0481   LearningRate 0.0860   Epoch: 1   Global Step: 18080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:11,158-Speed 3016.49 samples/sec   Loss 15.0849   LearningRate 0.0860   Epoch: 1   Global Step: 18090   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 03:27:14,509-Speed 3056.86 samples/sec   Loss 14.9035   LearningRate 0.0860   Epoch: 1   Global Step: 18100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:17,836-Speed 3078.75 samples/sec   Loss 15.1691   LearningRate 0.0860   Epoch: 1   Global Step: 18110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:21,157-Speed 3083.93 samples/sec   Loss 15.0879   LearningRate 0.0859   Epoch: 1   Global Step: 18120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:24,551-Speed 3018.20 samples/sec   Loss 15.0547   LearningRate 0.0859   Epoch: 1   Global Step: 18130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:27,938-Speed 3024.34 samples/sec   Loss 14.8966   LearningRate 0.0859   Epoch: 1   Global Step: 18140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:31,289-Speed 3056.57 samples/sec   Loss 14.9849   LearningRate 0.0859   Epoch: 1   Global Step: 18150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:34,607-Speed 3087.23 samples/sec   Loss 14.9674   LearningRate 0.0859   Epoch: 1   Global Step: 18160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:37,935-Speed 3077.99 samples/sec   Loss 15.0055   LearningRate 0.0859   Epoch: 1   Global Step: 18170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:41,297-Speed 3047.13 samples/sec   Loss 14.9231   LearningRate 0.0859   Epoch: 1   Global Step: 18180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:44,621-Speed 3081.15 samples/sec   Loss 14.9271   LearningRate 0.0859   Epoch: 1   Global Step: 18190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:47,947-Speed 3079.54 samples/sec   Loss 15.0827   LearningRate 0.0859   Epoch: 1   Global Step: 18200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:51,319-Speed 3038.22 samples/sec   Loss 14.9354   LearningRate 0.0859   Epoch: 1   Global Step: 18210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:54,699-Speed 3030.18 samples/sec   Loss 14.8800   LearningRate 0.0859   Epoch: 1   Global Step: 18220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:27:58,023-Speed 3084.43 samples/sec   Loss 14.9808   LearningRate 0.0859   Epoch: 1   Global Step: 18230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:28:01,363-Speed 3066.21 samples/sec   Loss 15.0074   LearningRate 0.0859   Epoch: 1   Global Step: 18240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:28:04,723-Speed 3048.52 samples/sec   Loss 15.1408   LearningRate 0.0858   Epoch: 1   Global Step: 18250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:28:08,074-Speed 3056.98 samples/sec   Loss 15.1166   LearningRate 0.0858   Epoch: 1   Global Step: 18260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:28:11,420-Speed 3061.59 samples/sec   Loss 15.0706   LearningRate 0.0858   Epoch: 1   Global Step: 18270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:28:14,797-Speed 3032.79 samples/sec   Loss 14.8525   LearningRate 0.0858   Epoch: 1   Global Step: 18280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:28:18,191-Speed 3017.84 samples/sec   Loss 15.0327   LearningRate 0.0858   Epoch: 1   Global Step: 18290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:28:21,470-Speed 3124.53 samples/sec   Loss 15.1405   LearningRate 0.0858   Epoch: 1   Global Step: 18300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:28:24,760-Speed 3113.16 samples/sec   Loss 14.9043   LearningRate 0.0858   Epoch: 1   Global Step: 18310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:28:28,074-Speed 3090.06 samples/sec   Loss 15.0298   LearningRate 0.0858   Epoch: 1   Global Step: 18320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:28:31,334-Speed 3143.75 samples/sec   Loss 14.9682   LearningRate 0.0858   Epoch: 1   Global Step: 18330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:28:34,744-Speed 3003.25 samples/sec   Loss 15.2191   LearningRate 0.0858   Epoch: 1   Global Step: 18340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:28:38,085-Speed 3065.29 samples/sec   Loss 15.0235   LearningRate 0.0858   Epoch: 1   Global Step: 18350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:28:41,416-Speed 3075.89 samples/sec   Loss 14.9092   LearningRate 0.0858   Epoch: 1   Global Step: 18360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:28:44,761-Speed 3062.14 samples/sec   Loss 15.0771   LearningRate 0.0858   Epoch: 1   Global Step: 18370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:28:48,140-Speed 3031.30 samples/sec   Loss 14.8969   LearningRate 0.0857   Epoch: 1   Global Step: 18380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:28:51,454-Speed 3090.87 samples/sec   Loss 14.8869   LearningRate 0.0857   Epoch: 1   Global Step: 18390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:28:54,855-Speed 3011.37 samples/sec   Loss 14.8827   LearningRate 0.0857   Epoch: 1   Global Step: 18400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:28:58,199-Speed 3063.35 samples/sec   Loss 14.8504   LearningRate 0.0857   Epoch: 1   Global Step: 18410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:29:01,556-Speed 3051.12 samples/sec   Loss 15.0340   LearningRate 0.0857   Epoch: 1   Global Step: 18420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:29:04,872-Speed 3089.59 samples/sec   Loss 15.0381   LearningRate 0.0857   Epoch: 1   Global Step: 18430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:08,163-Speed 3111.99 samples/sec   Loss 14.9338   LearningRate 0.0857   Epoch: 1   Global Step: 18440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:11,477-Speed 3091.07 samples/sec   Loss 15.0345   LearningRate 0.0857   Epoch: 1   Global Step: 18450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:14,836-Speed 3049.74 samples/sec   Loss 15.0474   LearningRate 0.0857   Epoch: 1   Global Step: 18460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:18,189-Speed 3054.13 samples/sec   Loss 14.8443   LearningRate 0.0857   Epoch: 1   Global Step: 18470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:21,512-Speed 3082.59 samples/sec   Loss 14.9975   LearningRate 0.0857   Epoch: 1   Global Step: 18480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:24,783-Speed 3131.47 samples/sec   Loss 14.9349   LearningRate 0.0857   Epoch: 1   Global Step: 18490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:28,143-Speed 3048.23 samples/sec   Loss 15.1132   LearningRate 0.0857   Epoch: 1   Global Step: 18500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:31,497-Speed 3055.16 samples/sec   Loss 14.9985   LearningRate 0.0857   Epoch: 1   Global Step: 18510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:34,800-Speed 3101.04 samples/sec   Loss 15.0951   LearningRate 0.0856   Epoch: 1   Global Step: 18520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:38,063-Speed 3138.79 samples/sec   Loss 15.0024   LearningRate 0.0856   Epoch: 1   Global Step: 18530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:41,393-Speed 3076.13 samples/sec   Loss 14.8511   LearningRate 0.0856   Epoch: 1   Global Step: 18540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:44,725-Speed 3074.48 samples/sec   Loss 15.0339   LearningRate 0.0856   Epoch: 1   Global Step: 18550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:48,026-Speed 3102.53 samples/sec   Loss 15.1566   LearningRate 0.0856   Epoch: 1   Global Step: 18560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:51,361-Speed 3071.41 samples/sec   Loss 14.9229   LearningRate 0.0856   Epoch: 1   Global Step: 18570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:54,676-Speed 3090.10 samples/sec   Loss 14.9437   LearningRate 0.0856   Epoch: 1   Global Step: 18580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:29:58,059-Speed 3027.91 samples/sec   Loss 15.1181   LearningRate 0.0856   Epoch: 1   Global Step: 18590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:30:01,413-Speed 3053.72 samples/sec   Loss 15.0051   LearningRate 0.0856   Epoch: 1   Global Step: 18600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:30:04,732-Speed 3086.79 samples/sec   Loss 14.9150   LearningRate 0.0856   Epoch: 1   Global Step: 18610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:30:08,019-Speed 3116.49 samples/sec   Loss 15.0041   LearningRate 0.0856   Epoch: 1   Global Step: 18620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:30:11,333-Speed 3090.40 samples/sec   Loss 15.0693   LearningRate 0.0856   Epoch: 1   Global Step: 18630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:30:14,665-Speed 3073.99 samples/sec   Loss 14.9668   LearningRate 0.0856   Epoch: 1   Global Step: 18640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:30:17,932-Speed 3135.51 samples/sec   Loss 15.0625   LearningRate 0.0855   Epoch: 1   Global Step: 18650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:30:21,195-Speed 3139.39 samples/sec   Loss 15.0575   LearningRate 0.0855   Epoch: 1   Global Step: 18660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:30:24,587-Speed 3019.26 samples/sec   Loss 15.0600   LearningRate 0.0855   Epoch: 1   Global Step: 18670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:30:27,876-Speed 3114.73 samples/sec   Loss 14.8878   LearningRate 0.0855   Epoch: 1   Global Step: 18680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:30:31,133-Speed 3145.38 samples/sec   Loss 14.8290   LearningRate 0.0855   Epoch: 1   Global Step: 18690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:30:34,458-Speed 3080.38 samples/sec   Loss 14.9722   LearningRate 0.0855   Epoch: 1   Global Step: 18700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:30:37,837-Speed 3031.29 samples/sec   Loss 14.9493   LearningRate 0.0855   Epoch: 1   Global Step: 18710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:30:41,190-Speed 3055.04 samples/sec   Loss 14.9122   LearningRate 0.0855   Epoch: 1   Global Step: 18720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:30:44,486-Speed 3107.30 samples/sec   Loss 15.0004   LearningRate 0.0855   Epoch: 1   Global Step: 18730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:30:47,746-Speed 3142.35 samples/sec   Loss 14.8838   LearningRate 0.0855   Epoch: 1   Global Step: 18740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:30:51,082-Speed 3070.54 samples/sec   Loss 15.0348   LearningRate 0.0855   Epoch: 1   Global Step: 18750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:30:54,470-Speed 3023.05 samples/sec   Loss 14.8749   LearningRate 0.0855   Epoch: 1   Global Step: 18760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:30:57,825-Speed 3053.04 samples/sec   Loss 15.0023   LearningRate 0.0855   Epoch: 1   Global Step: 18770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:31:01,136-Speed 3093.72 samples/sec   Loss 15.0971   LearningRate 0.0855   Epoch: 1   Global Step: 18780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:31:04,470-Speed 3072.36 samples/sec   Loss 14.9743   LearningRate 0.0854   Epoch: 1   Global Step: 18790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:31:07,778-Speed 3096.13 samples/sec   Loss 15.0151   LearningRate 0.0854   Epoch: 1   Global Step: 18800   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:31:11,122-Speed 3063.59 samples/sec   Loss 14.9366   LearningRate 0.0854   Epoch: 1   Global Step: 18810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:31:14,452-Speed 3076.04 samples/sec   Loss 14.9397   LearningRate 0.0854   Epoch: 1   Global Step: 18820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:31:17,803-Speed 3056.53 samples/sec   Loss 14.8274   LearningRate 0.0854   Epoch: 1   Global Step: 18830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:31:21,071-Speed 3134.67 samples/sec   Loss 14.9237   LearningRate 0.0854   Epoch: 1   Global Step: 18840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:31:24,379-Speed 3095.93 samples/sec   Loss 14.8501   LearningRate 0.0854   Epoch: 1   Global Step: 18850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:31:27,659-Speed 3123.34 samples/sec   Loss 14.9231   LearningRate 0.0854   Epoch: 1   Global Step: 18860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:31:30,964-Speed 3099.52 samples/sec   Loss 14.8360   LearningRate 0.0854   Epoch: 1   Global Step: 18870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:31:34,281-Speed 3087.42 samples/sec   Loss 14.9616   LearningRate 0.0854   Epoch: 1   Global Step: 18880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:31:37,584-Speed 3102.08 samples/sec   Loss 14.8942   LearningRate 0.0854   Epoch: 1   Global Step: 18890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:31:40,926-Speed 3064.16 samples/sec   Loss 15.0211   LearningRate 0.0854   Epoch: 1   Global Step: 18900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:31:44,204-Speed 3125.46 samples/sec   Loss 14.8980   LearningRate 0.0854   Epoch: 1   Global Step: 18910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:31:47,602-Speed 3014.13 samples/sec   Loss 14.9933   LearningRate 0.0853   Epoch: 1   Global Step: 18920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:31:50,936-Speed 3072.41 samples/sec   Loss 14.8690   LearningRate 0.0853   Epoch: 1   Global Step: 18930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:31:54,281-Speed 3061.65 samples/sec   Loss 14.9228   LearningRate 0.0853   Epoch: 1   Global Step: 18940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:31:57,574-Speed 3110.83 samples/sec   Loss 14.7758   LearningRate 0.0853   Epoch: 1   Global Step: 18950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:32:00,923-Speed 3057.91 samples/sec   Loss 14.9349   LearningRate 0.0853   Epoch: 1   Global Step: 18960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:32:04,176-Speed 3149.27 samples/sec   Loss 14.9482   LearningRate 0.0853   Epoch: 1   Global Step: 18970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:32:07,484-Speed 3096.30 samples/sec   Loss 15.0302   LearningRate 0.0853   Epoch: 1   Global Step: 18980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:32:10,851-Speed 3041.87 samples/sec   Loss 14.7718   LearningRate 0.0853   Epoch: 1   Global Step: 18990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:32:14,151-Speed 3104.33 samples/sec   Loss 14.7778   LearningRate 0.0853   Epoch: 1   Global Step: 19000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:32:17,493-Speed 3064.74 samples/sec   Loss 14.7721   LearningRate 0.0853   Epoch: 1   Global Step: 19010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:32:20,803-Speed 3094.95 samples/sec   Loss 14.9008   LearningRate 0.0853   Epoch: 1   Global Step: 19020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:32:24,102-Speed 3104.30 samples/sec   Loss 14.9968   LearningRate 0.0853   Epoch: 1   Global Step: 19030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:32:27,391-Speed 3114.42 samples/sec   Loss 14.8195   LearningRate 0.0853   Epoch: 1   Global Step: 19040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:32:30,708-Speed 3087.88 samples/sec   Loss 15.0174   LearningRate 0.0853   Epoch: 1   Global Step: 19050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:32:34,003-Speed 3108.21 samples/sec   Loss 14.8088   LearningRate 0.0852   Epoch: 1   Global Step: 19060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:32:37,378-Speed 3035.34 samples/sec   Loss 15.0365   LearningRate 0.0852   Epoch: 1   Global Step: 19070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:32:40,695-Speed 3087.99 samples/sec   Loss 14.8275   LearningRate 0.0852   Epoch: 1   Global Step: 19080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:32:44,034-Speed 3067.02 samples/sec   Loss 14.9582   LearningRate 0.0852   Epoch: 1   Global Step: 19090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:32:47,351-Speed 3088.06 samples/sec   Loss 14.8908   LearningRate 0.0852   Epoch: 1   Global Step: 19100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:32:50,650-Speed 3105.13 samples/sec   Loss 14.9067   LearningRate 0.0852   Epoch: 1   Global Step: 19110   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:32:53,993-Speed 3063.98 samples/sec   Loss 14.8106   LearningRate 0.0852   Epoch: 1   Global Step: 19120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:32:57,333-Speed 3066.58 samples/sec   Loss 14.7105   LearningRate 0.0852   Epoch: 1   Global Step: 19130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:33:00,624-Speed 3112.93 samples/sec   Loss 14.8180   LearningRate 0.0852   Epoch: 1   Global Step: 19140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:33:03,954-Speed 3075.43 samples/sec   Loss 14.9042   LearningRate 0.0852   Epoch: 1   Global Step: 19150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:33:07,260-Speed 3099.00 samples/sec   Loss 14.8209   LearningRate 0.0852   Epoch: 1   Global Step: 19160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:33:10,638-Speed 3031.36 samples/sec   Loss 14.8565   LearningRate 0.0852   Epoch: 1   Global Step: 19170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:33:14,042-Speed 3009.49 samples/sec   Loss 14.9181   LearningRate 0.0852   Epoch: 1   Global Step: 19180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:33:17,365-Speed 3082.12 samples/sec   Loss 14.7314   LearningRate 0.0851   Epoch: 1   Global Step: 19190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:33:20,618-Speed 3149.18 samples/sec   Loss 14.8663   LearningRate 0.0851   Epoch: 1   Global Step: 19200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:33:23,931-Speed 3091.05 samples/sec   Loss 14.8037   LearningRate 0.0851   Epoch: 1   Global Step: 19210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:33:27,234-Speed 3101.20 samples/sec   Loss 15.0185   LearningRate 0.0851   Epoch: 1   Global Step: 19220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:33:30,521-Speed 3116.47 samples/sec   Loss 14.8193   LearningRate 0.0851   Epoch: 1   Global Step: 19230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:33:33,868-Speed 3060.29 samples/sec   Loss 14.6785   LearningRate 0.0851   Epoch: 1   Global Step: 19240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:33:37,151-Speed 3120.15 samples/sec   Loss 15.0391   LearningRate 0.0851   Epoch: 1   Global Step: 19250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:33:40,480-Speed 3076.33 samples/sec   Loss 14.8239   LearningRate 0.0851   Epoch: 1   Global Step: 19260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:33:43,788-Speed 3097.07 samples/sec   Loss 14.7861   LearningRate 0.0851   Epoch: 1   Global Step: 19270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:33:47,121-Speed 3072.72 samples/sec   Loss 14.8338   LearningRate 0.0851   Epoch: 1   Global Step: 19280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:33:50,444-Speed 3082.31 samples/sec   Loss 14.8809   LearningRate 0.0851   Epoch: 1   Global Step: 19290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:33:53,720-Speed 3126.40 samples/sec   Loss 14.8438   LearningRate 0.0851   Epoch: 1   Global Step: 19300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:33:56,995-Speed 3127.63 samples/sec   Loss 14.7821   LearningRate 0.0851   Epoch: 1   Global Step: 19310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:00,283-Speed 3115.65 samples/sec   Loss 14.9898   LearningRate 0.0851   Epoch: 1   Global Step: 19320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:03,567-Speed 3119.00 samples/sec   Loss 14.8933   LearningRate 0.0850   Epoch: 1   Global Step: 19330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:06,931-Speed 3044.17 samples/sec   Loss 14.9272   LearningRate 0.0850   Epoch: 1   Global Step: 19340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:10,246-Speed 3090.43 samples/sec   Loss 14.8206   LearningRate 0.0850   Epoch: 1   Global Step: 19350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:13,519-Speed 3129.70 samples/sec   Loss 14.8624   LearningRate 0.0850   Epoch: 1   Global Step: 19360   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 03:34:16,837-Speed 3087.24 samples/sec   Loss 14.7145   LearningRate 0.0850   Epoch: 1   Global Step: 19370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:20,162-Speed 3079.88 samples/sec   Loss 14.8525   LearningRate 0.0850   Epoch: 1   Global Step: 19380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:23,531-Speed 3040.74 samples/sec   Loss 14.8709   LearningRate 0.0850   Epoch: 1   Global Step: 19390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:26,796-Speed 3137.43 samples/sec   Loss 14.8363   LearningRate 0.0850   Epoch: 1   Global Step: 19400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:30,073-Speed 3125.94 samples/sec   Loss 14.7324   LearningRate 0.0850   Epoch: 1   Global Step: 19410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:33,379-Speed 3098.20 samples/sec   Loss 14.7105   LearningRate 0.0850   Epoch: 1   Global Step: 19420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:36,668-Speed 3114.51 samples/sec   Loss 14.8278   LearningRate 0.0850   Epoch: 1   Global Step: 19430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:39,998-Speed 3075.58 samples/sec   Loss 14.8486   LearningRate 0.0850   Epoch: 1   Global Step: 19440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:43,268-Speed 3132.62 samples/sec   Loss 14.7592   LearningRate 0.0850   Epoch: 1   Global Step: 19450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:46,544-Speed 3126.86 samples/sec   Loss 14.8965   LearningRate 0.0849   Epoch: 1   Global Step: 19460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:49,791-Speed 3154.33 samples/sec   Loss 14.8942   LearningRate 0.0849   Epoch: 1   Global Step: 19470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:53,057-Speed 3135.82 samples/sec   Loss 14.8302   LearningRate 0.0849   Epoch: 1   Global Step: 19480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:56,347-Speed 3113.51 samples/sec   Loss 14.8247   LearningRate 0.0849   Epoch: 1   Global Step: 19490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:34:59,649-Speed 3102.36 samples/sec   Loss 14.7786   LearningRate 0.0849   Epoch: 1   Global Step: 19500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:02,939-Speed 3112.42 samples/sec   Loss 14.7231   LearningRate 0.0849   Epoch: 1   Global Step: 19510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:06,316-Speed 3033.21 samples/sec   Loss 14.7675   LearningRate 0.0849   Epoch: 1   Global Step: 19520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:09,669-Speed 3056.77 samples/sec   Loss 14.8580   LearningRate 0.0849   Epoch: 1   Global Step: 19530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:12,978-Speed 3096.03 samples/sec   Loss 14.6900   LearningRate 0.0849   Epoch: 1   Global Step: 19540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:16,253-Speed 3128.03 samples/sec   Loss 14.7729   LearningRate 0.0849   Epoch: 1   Global Step: 19550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:19,523-Speed 3132.13 samples/sec   Loss 14.6372   LearningRate 0.0849   Epoch: 1   Global Step: 19560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:22,785-Speed 3140.27 samples/sec   Loss 14.8490   LearningRate 0.0849   Epoch: 1   Global Step: 19570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:26,143-Speed 3050.26 samples/sec   Loss 14.7617   LearningRate 0.0849   Epoch: 1   Global Step: 19580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:29,502-Speed 3049.15 samples/sec   Loss 14.8071   LearningRate 0.0849   Epoch: 1   Global Step: 19590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:32,856-Speed 3053.39 samples/sec   Loss 14.7556   LearningRate 0.0848   Epoch: 1   Global Step: 19600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:36,159-Speed 3101.14 samples/sec   Loss 14.6961   LearningRate 0.0848   Epoch: 1   Global Step: 19610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:39,475-Speed 3088.90 samples/sec   Loss 14.7369   LearningRate 0.0848   Epoch: 1   Global Step: 19620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:42,735-Speed 3142.54 samples/sec   Loss 14.8257   LearningRate 0.0848   Epoch: 1   Global Step: 19630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:46,041-Speed 3097.71 samples/sec   Loss 14.8930   LearningRate 0.0848   Epoch: 1   Global Step: 19640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:49,380-Speed 3068.01 samples/sec   Loss 14.7809   LearningRate 0.0848   Epoch: 1   Global Step: 19650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:52,705-Speed 3080.47 samples/sec   Loss 14.8180   LearningRate 0.0848   Epoch: 1   Global Step: 19660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:35:55,978-Speed 3129.37 samples/sec   Loss 14.8247   LearningRate 0.0848   Epoch: 1   Global Step: 19670   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 03:35:59,292-Speed 3090.62 samples/sec   Loss 14.9389   LearningRate 0.0848   Epoch: 1   Global Step: 19680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:36:02,592-Speed 3104.67 samples/sec   Loss 14.7493   LearningRate 0.0848   Epoch: 1   Global Step: 19690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:36:05,899-Speed 3096.91 samples/sec   Loss 14.6787   LearningRate 0.0848   Epoch: 1   Global Step: 19700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:36:09,181-Speed 3121.38 samples/sec   Loss 14.7249   LearningRate 0.0848   Epoch: 1   Global Step: 19710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:36:12,476-Speed 3108.90 samples/sec   Loss 14.8554   LearningRate 0.0848   Epoch: 1   Global Step: 19720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:36:15,810-Speed 3071.94 samples/sec   Loss 14.8379   LearningRate 0.0847   Epoch: 1   Global Step: 19730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:36:19,102-Speed 3111.51 samples/sec   Loss 14.6916   LearningRate 0.0847   Epoch: 1   Global Step: 19740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:36:22,387-Speed 3117.69 samples/sec   Loss 14.7279   LearningRate 0.0847   Epoch: 1   Global Step: 19750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:36:25,688-Speed 3103.34 samples/sec   Loss 14.8611   LearningRate 0.0847   Epoch: 1   Global Step: 19760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:36:29,034-Speed 3060.97 samples/sec   Loss 14.7262   LearningRate 0.0847   Epoch: 1   Global Step: 19770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:36:32,316-Speed 3120.90 samples/sec   Loss 14.6684   LearningRate 0.0847   Epoch: 1   Global Step: 19780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:36:35,633-Speed 3088.77 samples/sec   Loss 14.7746   LearningRate 0.0847   Epoch: 1   Global Step: 19790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:36:39,041-Speed 3004.97 samples/sec   Loss 14.7466   LearningRate 0.0847   Epoch: 1   Global Step: 19800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:36:42,333-Speed 3111.30 samples/sec   Loss 14.7719   LearningRate 0.0847   Epoch: 1   Global Step: 19810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:36:45,595-Speed 3140.11 samples/sec   Loss 14.8382   LearningRate 0.0847   Epoch: 1   Global Step: 19820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:36:48,923-Speed 3078.66 samples/sec   Loss 14.9610   LearningRate 0.0847   Epoch: 1   Global Step: 19830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:36:52,277-Speed 3054.41 samples/sec   Loss 14.6003   LearningRate 0.0847   Epoch: 1   Global Step: 19840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:36:55,573-Speed 3107.43 samples/sec   Loss 14.7075   LearningRate 0.0847   Epoch: 1   Global Step: 19850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:36:58,849-Speed 3127.41 samples/sec   Loss 14.9352   LearningRate 0.0847   Epoch: 1   Global Step: 19860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:37:02,263-Speed 2999.75 samples/sec   Loss 14.7903   LearningRate 0.0846   Epoch: 1   Global Step: 19870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:37:05,597-Speed 3072.35 samples/sec   Loss 14.7385   LearningRate 0.0846   Epoch: 1   Global Step: 19880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:37:08,961-Speed 3044.94 samples/sec   Loss 14.7758   LearningRate 0.0846   Epoch: 1   Global Step: 19890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:37:12,277-Speed 3089.36 samples/sec   Loss 14.5032   LearningRate 0.0846   Epoch: 1   Global Step: 19900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:37:15,666-Speed 3021.91 samples/sec   Loss 14.9059   LearningRate 0.0846   Epoch: 1   Global Step: 19910   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:37:18,948-Speed 3121.59 samples/sec   Loss 14.7804   LearningRate 0.0846   Epoch: 1   Global Step: 19920   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:37:22,278-Speed 3076.46 samples/sec   Loss 14.8154   LearningRate 0.0846   Epoch: 1   Global Step: 19930   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:37:25,577-Speed 3104.06 samples/sec   Loss 14.7191   LearningRate 0.0846   Epoch: 1   Global Step: 19940   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:37:28,869-Speed 3111.19 samples/sec   Loss 14.8379   LearningRate 0.0846   Epoch: 1   Global Step: 19950   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:37:32,196-Speed 3078.84 samples/sec   Loss 14.7363   LearningRate 0.0846   Epoch: 1   Global Step: 19960   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:37:35,506-Speed 3095.10 samples/sec   Loss 14.6950   LearningRate 0.0846   Epoch: 1   Global Step: 19970   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:37:38,814-Speed 3095.59 samples/sec   Loss 14.6127   LearningRate 0.0846   Epoch: 1   Global Step: 19980   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:37:42,101-Speed 3116.97 samples/sec   Loss 14.6534   LearningRate 0.0846   Epoch: 1   Global Step: 19990   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:37:45,386-Speed 3117.55 samples/sec   Loss 14.6925   LearningRate 0.0845   Epoch: 1   Global Step: 20000   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:37:48,728-Speed 3065.06 samples/sec   Loss 14.8122   LearningRate 0.0845   Epoch: 1   Global Step: 20010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:37:52,031-Speed 3101.04 samples/sec   Loss 14.7494   LearningRate 0.0845   Epoch: 1   Global Step: 20020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:37:55,347-Speed 3088.80 samples/sec   Loss 14.7106   LearningRate 0.0845   Epoch: 1   Global Step: 20030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:37:58,686-Speed 3067.97 samples/sec   Loss 14.7012   LearningRate 0.0845   Epoch: 1   Global Step: 20040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:38:01,960-Speed 3128.62 samples/sec   Loss 14.6969   LearningRate 0.0845   Epoch: 1   Global Step: 20050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:38:05,276-Speed 3089.29 samples/sec   Loss 14.6606   LearningRate 0.0845   Epoch: 1   Global Step: 20060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:38:08,636-Speed 3049.22 samples/sec   Loss 14.8293   LearningRate 0.0845   Epoch: 1   Global Step: 20070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:38:12,036-Speed 3012.76 samples/sec   Loss 14.8227   LearningRate 0.0845   Epoch: 1   Global Step: 20080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:38:15,356-Speed 3085.14 samples/sec   Loss 14.7114   LearningRate 0.0845   Epoch: 1   Global Step: 20090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:38:18,690-Speed 3072.75 samples/sec   Loss 14.7867   LearningRate 0.0845   Epoch: 1   Global Step: 20100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:38:22,089-Speed 3013.32 samples/sec   Loss 14.6424   LearningRate 0.0845   Epoch: 1   Global Step: 20110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:38:25,452-Speed 3045.63 samples/sec   Loss 14.8131   LearningRate 0.0845   Epoch: 1   Global Step: 20120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:38:28,782-Speed 3075.89 samples/sec   Loss 14.6732   LearningRate 0.0845   Epoch: 1   Global Step: 20130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:38:32,155-Speed 3037.29 samples/sec   Loss 14.6971   LearningRate 0.0844   Epoch: 1   Global Step: 20140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:38:35,449-Speed 3109.13 samples/sec   Loss 14.6377   LearningRate 0.0844   Epoch: 1   Global Step: 20150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:38:38,706-Speed 3145.24 samples/sec   Loss 14.8254   LearningRate 0.0844   Epoch: 1   Global Step: 20160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:38:42,041-Speed 3071.20 samples/sec   Loss 14.7258   LearningRate 0.0844   Epoch: 1   Global Step: 20170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:38:45,335-Speed 3109.90 samples/sec   Loss 14.6586   LearningRate 0.0844   Epoch: 1   Global Step: 20180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:38:48,682-Speed 3060.38 samples/sec   Loss 14.6466   LearningRate 0.0844   Epoch: 1   Global Step: 20190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:38:52,044-Speed 3046.07 samples/sec   Loss 14.6600   LearningRate 0.0844   Epoch: 1   Global Step: 20200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:38:55,376-Speed 3074.45 samples/sec   Loss 14.7007   LearningRate 0.0844   Epoch: 1   Global Step: 20210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:38:58,710-Speed 3072.49 samples/sec   Loss 14.6659   LearningRate 0.0844   Epoch: 1   Global Step: 20220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:39:02,041-Speed 3075.02 samples/sec   Loss 14.8482   LearningRate 0.0844   Epoch: 1   Global Step: 20230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:39:05,394-Speed 3054.36 samples/sec   Loss 14.7433   LearningRate 0.0844   Epoch: 1   Global Step: 20240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:39:08,737-Speed 3064.46 samples/sec   Loss 14.7764   LearningRate 0.0844   Epoch: 1   Global Step: 20250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:39:12,047-Speed 3095.13 samples/sec   Loss 14.6043   LearningRate 0.0844   Epoch: 1   Global Step: 20260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:39:15,371-Speed 3081.30 samples/sec   Loss 14.6586   LearningRate 0.0843   Epoch: 1   Global Step: 20270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:39:18,666-Speed 3109.22 samples/sec   Loss 14.6181   LearningRate 0.0843   Epoch: 1   Global Step: 20280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:39:22,112-Speed 2972.41 samples/sec   Loss 14.5314   LearningRate 0.0843   Epoch: 1   Global Step: 20290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:39:25,439-Speed 3077.71 samples/sec   Loss 14.7132   LearningRate 0.0843   Epoch: 1   Global Step: 20300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:39:28,737-Speed 3106.68 samples/sec   Loss 14.7074   LearningRate 0.0843   Epoch: 1   Global Step: 20310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:39:32,053-Speed 3088.50 samples/sec   Loss 14.7239   LearningRate 0.0843   Epoch: 1   Global Step: 20320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:39:35,397-Speed 3063.26 samples/sec   Loss 14.7115   LearningRate 0.0843   Epoch: 1   Global Step: 20330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:39:38,786-Speed 3022.68 samples/sec   Loss 14.7684   LearningRate 0.0843   Epoch: 1   Global Step: 20340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:39:42,086-Speed 3104.00 samples/sec   Loss 14.6937   LearningRate 0.0843   Epoch: 1   Global Step: 20350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:39:45,447-Speed 3046.52 samples/sec   Loss 14.6261   LearningRate 0.0843   Epoch: 1   Global Step: 20360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:39:48,753-Speed 3098.64 samples/sec   Loss 14.6358   LearningRate 0.0843   Epoch: 1   Global Step: 20370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:39:52,088-Speed 3071.82 samples/sec   Loss 14.7170   LearningRate 0.0843   Epoch: 1   Global Step: 20380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:39:55,367-Speed 3123.67 samples/sec   Loss 14.6463   LearningRate 0.0843   Epoch: 1   Global Step: 20390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:39:58,703-Speed 3069.91 samples/sec   Loss 14.7828   LearningRate 0.0843   Epoch: 1   Global Step: 20400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:02,047-Speed 3063.47 samples/sec   Loss 14.7176   LearningRate 0.0842   Epoch: 1   Global Step: 20410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:05,400-Speed 3054.32 samples/sec   Loss 14.7955   LearningRate 0.0842   Epoch: 1   Global Step: 20420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:08,678-Speed 3125.15 samples/sec   Loss 14.6208   LearningRate 0.0842   Epoch: 1   Global Step: 20430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:11,988-Speed 3094.90 samples/sec   Loss 14.6637   LearningRate 0.0842   Epoch: 1   Global Step: 20440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:15,270-Speed 3120.46 samples/sec   Loss 14.7888   LearningRate 0.0842   Epoch: 1   Global Step: 20450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:18,564-Speed 3109.95 samples/sec   Loss 14.7999   LearningRate 0.0842   Epoch: 1   Global Step: 20460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:21,891-Speed 3078.52 samples/sec   Loss 14.6306   LearningRate 0.0842   Epoch: 1   Global Step: 20470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:40:25,184-Speed 3109.88 samples/sec   Loss 14.5868   LearningRate 0.0842   Epoch: 1   Global Step: 20480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:40:28,485-Speed 3103.53 samples/sec   Loss 14.8133   LearningRate 0.0842   Epoch: 1   Global Step: 20490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:40:31,746-Speed 3141.72 samples/sec   Loss 14.7401   LearningRate 0.0842   Epoch: 1   Global Step: 20500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:35,116-Speed 3039.44 samples/sec   Loss 14.6674   LearningRate 0.0842   Epoch: 1   Global Step: 20510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:38,383-Speed 3134.70 samples/sec   Loss 14.8685   LearningRate 0.0842   Epoch: 1   Global Step: 20520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:41,770-Speed 3024.41 samples/sec   Loss 14.5636   LearningRate 0.0842   Epoch: 1   Global Step: 20530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:45,066-Speed 3107.43 samples/sec   Loss 14.7227   LearningRate 0.0841   Epoch: 1   Global Step: 20540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:48,354-Speed 3115.67 samples/sec   Loss 14.6717   LearningRate 0.0841   Epoch: 1   Global Step: 20550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:51,756-Speed 3010.69 samples/sec   Loss 14.6161   LearningRate 0.0841   Epoch: 1   Global Step: 20560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:55,106-Speed 3057.66 samples/sec   Loss 14.6239   LearningRate 0.0841   Epoch: 1   Global Step: 20570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:40:58,437-Speed 3074.95 samples/sec   Loss 14.7745   LearningRate 0.0841   Epoch: 1   Global Step: 20580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:41:01,707-Speed 3132.76 samples/sec   Loss 14.6506   LearningRate 0.0841   Epoch: 1   Global Step: 20590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:41:05,037-Speed 3075.82 samples/sec   Loss 14.8112   LearningRate 0.0841   Epoch: 1   Global Step: 20600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:41:08,340-Speed 3101.21 samples/sec   Loss 14.6471   LearningRate 0.0841   Epoch: 1   Global Step: 20610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:41:11,619-Speed 3124.22 samples/sec   Loss 14.5714   LearningRate 0.0841   Epoch: 1   Global Step: 20620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:41:14,893-Speed 3128.08 samples/sec   Loss 14.7007   LearningRate 0.0841   Epoch: 1   Global Step: 20630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:41:18,221-Speed 3077.78 samples/sec   Loss 14.6620   LearningRate 0.0841   Epoch: 1   Global Step: 20640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:41:21,573-Speed 3055.99 samples/sec   Loss 14.4786   LearningRate 0.0841   Epoch: 1   Global Step: 20650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:41:24,828-Speed 3147.19 samples/sec   Loss 14.5958   LearningRate 0.0841   Epoch: 1   Global Step: 20660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:41:28,148-Speed 3085.50 samples/sec   Loss 14.4873   LearningRate 0.0841   Epoch: 1   Global Step: 20670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:41:31,430-Speed 3120.48 samples/sec   Loss 14.8662   LearningRate 0.0840   Epoch: 1   Global Step: 20680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:41:34,823-Speed 3019.91 samples/sec   Loss 14.4201   LearningRate 0.0840   Epoch: 1   Global Step: 20690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:41:38,146-Speed 3082.91 samples/sec   Loss 14.6640   LearningRate 0.0840   Epoch: 1   Global Step: 20700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:41:41,456-Speed 3094.32 samples/sec   Loss 14.6337   LearningRate 0.0840   Epoch: 1   Global Step: 20710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:41:44,808-Speed 3055.12 samples/sec   Loss 14.6363   LearningRate 0.0840   Epoch: 1   Global Step: 20720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:41:48,110-Speed 3103.02 samples/sec   Loss 14.5090   LearningRate 0.0840   Epoch: 1   Global Step: 20730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:41:51,459-Speed 3058.18 samples/sec   Loss 14.8014   LearningRate 0.0840   Epoch: 1   Global Step: 20740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:41:54,849-Speed 3021.85 samples/sec   Loss 14.9142   LearningRate 0.0840   Epoch: 1   Global Step: 20750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:41:58,159-Speed 3094.07 samples/sec   Loss 14.6756   LearningRate 0.0840   Epoch: 1   Global Step: 20760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:42:01,443-Speed 3118.86 samples/sec   Loss 14.6735   LearningRate 0.0840   Epoch: 1   Global Step: 20770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:42:04,744-Speed 3102.91 samples/sec   Loss 14.7607   LearningRate 0.0840   Epoch: 1   Global Step: 20780   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:42:08,090-Speed 3061.84 samples/sec   Loss 14.8778   LearningRate 0.0840   Epoch: 1   Global Step: 20790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:42:11,421-Speed 3074.61 samples/sec   Loss 14.4739   LearningRate 0.0840   Epoch: 1   Global Step: 20800   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:42:14,777-Speed 3052.22 samples/sec   Loss 14.7252   LearningRate 0.0839   Epoch: 1   Global Step: 20810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:42:18,055-Speed 3124.70 samples/sec   Loss 14.5783   LearningRate 0.0839   Epoch: 1   Global Step: 20820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:42:21,399-Speed 3063.38 samples/sec   Loss 14.5694   LearningRate 0.0839   Epoch: 1   Global Step: 20830   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 03:42:24,738-Speed 3067.21 samples/sec   Loss 14.4210   LearningRate 0.0839   Epoch: 1   Global Step: 20840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:42:28,041-Speed 3101.51 samples/sec   Loss 14.7088   LearningRate 0.0839   Epoch: 1   Global Step: 20850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:42:31,421-Speed 3030.76 samples/sec   Loss 14.5502   LearningRate 0.0839   Epoch: 1   Global Step: 20860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:42:34,822-Speed 3010.99 samples/sec   Loss 14.5541   LearningRate 0.0839   Epoch: 1   Global Step: 20870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:42:38,221-Speed 3014.06 samples/sec   Loss 14.4918   LearningRate 0.0839   Epoch: 1   Global Step: 20880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:42:41,533-Speed 3092.71 samples/sec   Loss 14.5543   LearningRate 0.0839   Epoch: 1   Global Step: 20890   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:42:44,814-Speed 3122.03 samples/sec   Loss 14.5946   LearningRate 0.0839   Epoch: 1   Global Step: 20900   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:42:48,086-Speed 3130.05 samples/sec   Loss 14.7262   LearningRate 0.0839   Epoch: 1   Global Step: 20910   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:42:51,442-Speed 3052.24 samples/sec   Loss 14.7581   LearningRate 0.0839   Epoch: 1   Global Step: 20920   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:42:54,799-Speed 3051.55 samples/sec   Loss 14.6292   LearningRate 0.0839   Epoch: 1   Global Step: 20930   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:42:58,110-Speed 3093.22 samples/sec   Loss 14.7280   LearningRate 0.0839   Epoch: 1   Global Step: 20940   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:43:01,471-Speed 3047.44 samples/sec   Loss 14.6739   LearningRate 0.0838   Epoch: 1   Global Step: 20950   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:43:04,810-Speed 3067.59 samples/sec   Loss 14.6481   LearningRate 0.0838   Epoch: 1   Global Step: 20960   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:43:08,117-Speed 3097.07 samples/sec   Loss 14.5514   LearningRate 0.0838   Epoch: 1   Global Step: 20970   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:43:11,409-Speed 3111.91 samples/sec   Loss 14.6499   LearningRate 0.0838   Epoch: 1   Global Step: 20980   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 03:43:14,787-Speed 3032.19 samples/sec   Loss 14.5205   LearningRate 0.0838   Epoch: 1   Global Step: 20990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:43:18,144-Speed 3051.06 samples/sec   Loss 14.5822   LearningRate 0.0838   Epoch: 1   Global Step: 21000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:43:21,491-Speed 3060.28 samples/sec   Loss 14.7102   LearningRate 0.0838   Epoch: 1   Global Step: 21010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:43:24,858-Speed 3041.70 samples/sec   Loss 14.6374   LearningRate 0.0838   Epoch: 1   Global Step: 21020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:43:28,191-Speed 3073.37 samples/sec   Loss 14.5087   LearningRate 0.0838   Epoch: 1   Global Step: 21030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:43:31,555-Speed 3044.61 samples/sec   Loss 14.5214   LearningRate 0.0838   Epoch: 1   Global Step: 21040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:43:34,884-Speed 3077.60 samples/sec   Loss 14.5940   LearningRate 0.0838   Epoch: 1   Global Step: 21050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:43:38,308-Speed 2990.81 samples/sec   Loss 14.6092   LearningRate 0.0838   Epoch: 1   Global Step: 21060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:43:41,702-Speed 3018.38 samples/sec   Loss 14.6307   LearningRate 0.0838   Epoch: 1   Global Step: 21070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:43:45,051-Speed 3058.81 samples/sec   Loss 14.8341   LearningRate 0.0837   Epoch: 1   Global Step: 21080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:43:48,392-Speed 3065.51 samples/sec   Loss 14.5790   LearningRate 0.0837   Epoch: 1   Global Step: 21090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:43:51,700-Speed 3097.40 samples/sec   Loss 14.5825   LearningRate 0.0837   Epoch: 1   Global Step: 21100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:43:55,050-Speed 3057.07 samples/sec   Loss 14.4933   LearningRate 0.0837   Epoch: 1   Global Step: 21110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:43:58,348-Speed 3106.35 samples/sec   Loss 14.5751   LearningRate 0.0837   Epoch: 1   Global Step: 21120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:44:01,624-Speed 3126.57 samples/sec   Loss 14.5990   LearningRate 0.0837   Epoch: 1   Global Step: 21130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:44:04,884-Speed 3141.23 samples/sec   Loss 14.6840   LearningRate 0.0837   Epoch: 1   Global Step: 21140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:44:08,225-Speed 3066.53 samples/sec   Loss 14.7286   LearningRate 0.0837   Epoch: 1   Global Step: 21150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:44:11,549-Speed 3081.54 samples/sec   Loss 14.6225   LearningRate 0.0837   Epoch: 1   Global Step: 21160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:44:14,856-Speed 3097.57 samples/sec   Loss 14.6222   LearningRate 0.0837   Epoch: 1   Global Step: 21170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:44:18,111-Speed 3146.22 samples/sec   Loss 14.5480   LearningRate 0.0837   Epoch: 1   Global Step: 21180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:44:21,422-Speed 3093.58 samples/sec   Loss 14.6329   LearningRate 0.0837   Epoch: 1   Global Step: 21190   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 03:44:24,763-Speed 3066.11 samples/sec   Loss 14.5026   LearningRate 0.0837   Epoch: 1   Global Step: 21200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:44:28,053-Speed 3113.77 samples/sec   Loss 14.6561   LearningRate 0.0837   Epoch: 1   Global Step: 21210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:44:31,360-Speed 3096.91 samples/sec   Loss 14.5102   LearningRate 0.0836   Epoch: 1   Global Step: 21220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:44:34,677-Speed 3088.06 samples/sec   Loss 14.4007   LearningRate 0.0836   Epoch: 1   Global Step: 21230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:44:38,026-Speed 3059.27 samples/sec   Loss 14.5292   LearningRate 0.0836   Epoch: 1   Global Step: 21240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:44:41,367-Speed 3066.13 samples/sec   Loss 14.5653   LearningRate 0.0836   Epoch: 1   Global Step: 21250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:44:44,690-Speed 3081.99 samples/sec   Loss 14.5409   LearningRate 0.0836   Epoch: 1   Global Step: 21260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:44:47,997-Speed 3097.15 samples/sec   Loss 14.5470   LearningRate 0.0836   Epoch: 1   Global Step: 21270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:44:51,328-Speed 3075.04 samples/sec   Loss 14.6583   LearningRate 0.0836   Epoch: 1   Global Step: 21280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:44:54,705-Speed 3034.99 samples/sec   Loss 14.4229   LearningRate 0.0836   Epoch: 1   Global Step: 21290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:44:58,028-Speed 3082.33 samples/sec   Loss 14.3797   LearningRate 0.0836   Epoch: 1   Global Step: 21300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:45:01,354-Speed 3080.11 samples/sec   Loss 14.7542   LearningRate 0.0836   Epoch: 1   Global Step: 21310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:45:04,603-Speed 3152.61 samples/sec   Loss 14.6863   LearningRate 0.0836   Epoch: 1   Global Step: 21320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:45:07,903-Speed 3102.84 samples/sec   Loss 14.5437   LearningRate 0.0836   Epoch: 1   Global Step: 21330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:45:11,209-Speed 3098.69 samples/sec   Loss 14.4919   LearningRate 0.0836   Epoch: 1   Global Step: 21340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:45:14,543-Speed 3072.65 samples/sec   Loss 14.6621   LearningRate 0.0835   Epoch: 1   Global Step: 21350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:45:17,920-Speed 3033.82 samples/sec   Loss 14.4564   LearningRate 0.0835   Epoch: 1   Global Step: 21360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:45:21,239-Speed 3085.79 samples/sec   Loss 14.5076   LearningRate 0.0835   Epoch: 1   Global Step: 21370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:45:24,562-Speed 3083.14 samples/sec   Loss 14.5288   LearningRate 0.0835   Epoch: 1   Global Step: 21380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:45:27,832-Speed 3133.11 samples/sec   Loss 14.5496   LearningRate 0.0835   Epoch: 1   Global Step: 21390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:45:31,158-Speed 3079.81 samples/sec   Loss 14.4816   LearningRate 0.0835   Epoch: 1   Global Step: 21400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:45:34,449-Speed 3112.18 samples/sec   Loss 14.2796   LearningRate 0.0835   Epoch: 1   Global Step: 21410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:45:37,810-Speed 3047.68 samples/sec   Loss 14.6034   LearningRate 0.0835   Epoch: 1   Global Step: 21420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:45:41,081-Speed 3131.13 samples/sec   Loss 14.5721   LearningRate 0.0835   Epoch: 1   Global Step: 21430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:45:44,348-Speed 3134.89 samples/sec   Loss 14.7100   LearningRate 0.0835   Epoch: 1   Global Step: 21440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:45:47,664-Speed 3089.84 samples/sec   Loss 14.4435   LearningRate 0.0835   Epoch: 1   Global Step: 21450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:45:50,997-Speed 3072.60 samples/sec   Loss 14.6635   LearningRate 0.0835   Epoch: 1   Global Step: 21460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:45:54,323-Speed 3079.55 samples/sec   Loss 14.5835   LearningRate 0.0835   Epoch: 1   Global Step: 21470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:45:57,607-Speed 3119.72 samples/sec   Loss 14.4896   LearningRate 0.0835   Epoch: 1   Global Step: 21480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:00,898-Speed 3112.62 samples/sec   Loss 14.4684   LearningRate 0.0834   Epoch: 1   Global Step: 21490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:04,222-Speed 3081.16 samples/sec   Loss 14.6357   LearningRate 0.0834   Epoch: 1   Global Step: 21500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:07,509-Speed 3116.47 samples/sec   Loss 14.5168   LearningRate 0.0834   Epoch: 1   Global Step: 21510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:10,827-Speed 3087.53 samples/sec   Loss 14.5067   LearningRate 0.0834   Epoch: 1   Global Step: 21520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:14,184-Speed 3051.51 samples/sec   Loss 14.5431   LearningRate 0.0834   Epoch: 1   Global Step: 21530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:17,515-Speed 3075.00 samples/sec   Loss 14.4804   LearningRate 0.0834   Epoch: 1   Global Step: 21540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:20,824-Speed 3095.46 samples/sec   Loss 14.6875   LearningRate 0.0834   Epoch: 1   Global Step: 21550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:24,156-Speed 3074.24 samples/sec   Loss 14.4842   LearningRate 0.0834   Epoch: 1   Global Step: 21560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:27,456-Speed 3103.29 samples/sec   Loss 14.5512   LearningRate 0.0834   Epoch: 1   Global Step: 21570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:46:30,759-Speed 3101.44 samples/sec   Loss 14.6993   LearningRate 0.0834   Epoch: 1   Global Step: 21580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:46:34,070-Speed 3093.33 samples/sec   Loss 14.6559   LearningRate 0.0834   Epoch: 1   Global Step: 21590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:46:37,335-Speed 3138.05 samples/sec   Loss 14.5502   LearningRate 0.0834   Epoch: 1   Global Step: 21600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:40,692-Speed 3050.32 samples/sec   Loss 14.5454   LearningRate 0.0834   Epoch: 1   Global Step: 21610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:43,978-Speed 3117.21 samples/sec   Loss 14.4663   LearningRate 0.0834   Epoch: 1   Global Step: 21620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:47,331-Speed 3055.71 samples/sec   Loss 14.5810   LearningRate 0.0833   Epoch: 1   Global Step: 21630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:50,629-Speed 3104.95 samples/sec   Loss 14.3747   LearningRate 0.0833   Epoch: 1   Global Step: 21640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:53,942-Speed 3091.81 samples/sec   Loss 14.5487   LearningRate 0.0833   Epoch: 1   Global Step: 21650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:46:57,233-Speed 3112.91 samples/sec   Loss 14.5956   LearningRate 0.0833   Epoch: 1   Global Step: 21660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:47:00,507-Speed 3128.49 samples/sec   Loss 14.5630   LearningRate 0.0833   Epoch: 1   Global Step: 21670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:47:03,818-Speed 3096.88 samples/sec   Loss 14.4429   LearningRate 0.0833   Epoch: 1   Global Step: 21680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:47:07,146-Speed 3077.49 samples/sec   Loss 14.4222   LearningRate 0.0833   Epoch: 1   Global Step: 21690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:47:10,500-Speed 3054.21 samples/sec   Loss 14.4657   LearningRate 0.0833   Epoch: 1   Global Step: 21700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:47:13,847-Speed 3060.69 samples/sec   Loss 14.5779   LearningRate 0.0833   Epoch: 1   Global Step: 21710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:47:17,142-Speed 3108.22 samples/sec   Loss 14.4246   LearningRate 0.0833   Epoch: 1   Global Step: 21720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:47:20,423-Speed 3122.15 samples/sec   Loss 14.4484   LearningRate 0.0833   Epoch: 1   Global Step: 21730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:47:23,751-Speed 3078.40 samples/sec   Loss 14.5739   LearningRate 0.0833   Epoch: 1   Global Step: 21740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:47:27,066-Speed 3089.64 samples/sec   Loss 14.5977   LearningRate 0.0833   Epoch: 1   Global Step: 21750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:47:30,373-Speed 3097.94 samples/sec   Loss 14.4657   LearningRate 0.0832   Epoch: 1   Global Step: 21760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:47:33,677-Speed 3100.20 samples/sec   Loss 14.4659   LearningRate 0.0832   Epoch: 1   Global Step: 21770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:47:37,009-Speed 3074.37 samples/sec   Loss 14.4349   LearningRate 0.0832   Epoch: 1   Global Step: 21780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:47:40,348-Speed 3067.46 samples/sec   Loss 14.4070   LearningRate 0.0832   Epoch: 1   Global Step: 21790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:47:43,704-Speed 3052.77 samples/sec   Loss 14.4341   LearningRate 0.0832   Epoch: 1   Global Step: 21800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:47:47,006-Speed 3101.59 samples/sec   Loss 14.4830   LearningRate 0.0832   Epoch: 1   Global Step: 21810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:47:50,321-Speed 3089.72 samples/sec   Loss 14.4176   LearningRate 0.0832   Epoch: 1   Global Step: 21820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:47:53,606-Speed 3118.84 samples/sec   Loss 14.3435   LearningRate 0.0832   Epoch: 1   Global Step: 21830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:47:56,928-Speed 3082.73 samples/sec   Loss 14.5090   LearningRate 0.0832   Epoch: 1   Global Step: 21840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:00,207-Speed 3124.04 samples/sec   Loss 14.4194   LearningRate 0.0832   Epoch: 1   Global Step: 21850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:03,505-Speed 3106.71 samples/sec   Loss 14.4517   LearningRate 0.0832   Epoch: 1   Global Step: 21860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:06,796-Speed 3112.38 samples/sec   Loss 14.5727   LearningRate 0.0832   Epoch: 1   Global Step: 21870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:10,055-Speed 3143.59 samples/sec   Loss 14.3664   LearningRate 0.0832   Epoch: 1   Global Step: 21880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:13,398-Speed 3063.51 samples/sec   Loss 14.5426   LearningRate 0.0832   Epoch: 1   Global Step: 21890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:16,703-Speed 3099.67 samples/sec   Loss 14.2990   LearningRate 0.0831   Epoch: 1   Global Step: 21900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:19,981-Speed 3124.87 samples/sec   Loss 14.4837   LearningRate 0.0831   Epoch: 1   Global Step: 21910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:23,284-Speed 3101.63 samples/sec   Loss 14.4545   LearningRate 0.0831   Epoch: 1   Global Step: 21920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:26,632-Speed 3059.37 samples/sec   Loss 14.6092   LearningRate 0.0831   Epoch: 1   Global Step: 21930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:29,989-Speed 3051.57 samples/sec   Loss 14.6305   LearningRate 0.0831   Epoch: 1   Global Step: 21940   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 03:48:33,335-Speed 3060.58 samples/sec   Loss 14.5512   LearningRate 0.0831   Epoch: 1   Global Step: 21950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:36,717-Speed 3029.48 samples/sec   Loss 14.2926   LearningRate 0.0831   Epoch: 1   Global Step: 21960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:40,025-Speed 3095.94 samples/sec   Loss 14.4356   LearningRate 0.0831   Epoch: 1   Global Step: 21970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:43,349-Speed 3080.97 samples/sec   Loss 14.2520   LearningRate 0.0831   Epoch: 1   Global Step: 21980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:46,734-Speed 3026.25 samples/sec   Loss 14.4511   LearningRate 0.0831   Epoch: 1   Global Step: 21990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:50,069-Speed 3071.53 samples/sec   Loss 14.4239   LearningRate 0.0831   Epoch: 1   Global Step: 22000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:53,373-Speed 3100.30 samples/sec   Loss 14.4226   LearningRate 0.0831   Epoch: 1   Global Step: 22010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:56,665-Speed 3111.66 samples/sec   Loss 14.4530   LearningRate 0.0831   Epoch: 1   Global Step: 22020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:48:59,976-Speed 3093.04 samples/sec   Loss 14.4441   LearningRate 0.0831   Epoch: 1   Global Step: 22030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:49:03,282-Speed 3098.96 samples/sec   Loss 14.4112   LearningRate 0.0830   Epoch: 1   Global Step: 22040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:49:06,616-Speed 3072.22 samples/sec   Loss 14.4487   LearningRate 0.0830   Epoch: 1   Global Step: 22050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:49:09,901-Speed 3117.40 samples/sec   Loss 14.4241   LearningRate 0.0830   Epoch: 1   Global Step: 22060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:49:13,201-Speed 3103.72 samples/sec   Loss 14.3417   LearningRate 0.0830   Epoch: 1   Global Step: 22070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:49:16,502-Speed 3103.33 samples/sec   Loss 14.4533   LearningRate 0.0830   Epoch: 1   Global Step: 22080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:49:19,788-Speed 3116.94 samples/sec   Loss 14.3496   LearningRate 0.0830   Epoch: 1   Global Step: 22090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:49:23,117-Speed 3077.61 samples/sec   Loss 14.4186   LearningRate 0.0830   Epoch: 1   Global Step: 22100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:49:26,506-Speed 3022.16 samples/sec   Loss 14.3280   LearningRate 0.0830   Epoch: 1   Global Step: 22110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:49:29,832-Speed 3079.90 samples/sec   Loss 14.3527   LearningRate 0.0830   Epoch: 1   Global Step: 22120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:49:33,149-Speed 3087.60 samples/sec   Loss 14.3493   LearningRate 0.0830   Epoch: 1   Global Step: 22130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:49:36,424-Speed 3127.35 samples/sec   Loss 14.3899   LearningRate 0.0830   Epoch: 1   Global Step: 22140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:49:39,766-Speed 3065.40 samples/sec   Loss 14.3795   LearningRate 0.0830   Epoch: 1   Global Step: 22150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:49:43,151-Speed 3025.85 samples/sec   Loss 14.4136   LearningRate 0.0830   Epoch: 1   Global Step: 22160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:49:46,472-Speed 3084.22 samples/sec   Loss 14.3866   LearningRate 0.0829   Epoch: 1   Global Step: 22170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:49:49,784-Speed 3093.25 samples/sec   Loss 14.3971   LearningRate 0.0829   Epoch: 1   Global Step: 22180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:49:53,101-Speed 3088.39 samples/sec   Loss 14.3842   LearningRate 0.0829   Epoch: 1   Global Step: 22190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:49:56,477-Speed 3033.57 samples/sec   Loss 14.2916   LearningRate 0.0829   Epoch: 1   Global Step: 22200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:49:59,814-Speed 3069.76 samples/sec   Loss 14.4727   LearningRate 0.0829   Epoch: 1   Global Step: 22210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:50:03,148-Speed 3071.74 samples/sec   Loss 14.4580   LearningRate 0.0829   Epoch: 1   Global Step: 22220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:50:06,447-Speed 3104.88 samples/sec   Loss 14.2196   LearningRate 0.0829   Epoch: 1   Global Step: 22230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:50:09,764-Speed 3087.92 samples/sec   Loss 14.4419   LearningRate 0.0829   Epoch: 1   Global Step: 22240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:50:13,048-Speed 3119.39 samples/sec   Loss 14.5166   LearningRate 0.0829   Epoch: 1   Global Step: 22250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:50:16,411-Speed 3045.81 samples/sec   Loss 14.3988   LearningRate 0.0829   Epoch: 1   Global Step: 22260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:50:19,835-Speed 2991.57 samples/sec   Loss 14.3931   LearningRate 0.0829   Epoch: 1   Global Step: 22270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:50:23,191-Speed 3052.65 samples/sec   Loss 14.5936   LearningRate 0.0829   Epoch: 1   Global Step: 22280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:50:26,518-Speed 3077.96 samples/sec   Loss 14.3923   LearningRate 0.0829   Epoch: 1   Global Step: 22290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:50:29,816-Speed 3105.97 samples/sec   Loss 14.4365   LearningRate 0.0829   Epoch: 1   Global Step: 22300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:50:33,133-Speed 3089.40 samples/sec   Loss 14.3916   LearningRate 0.0828   Epoch: 1   Global Step: 22310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:50:36,441-Speed 3096.16 samples/sec   Loss 14.3668   LearningRate 0.0828   Epoch: 1   Global Step: 22320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:50:39,735-Speed 3109.54 samples/sec   Loss 14.3242   LearningRate 0.0828   Epoch: 1   Global Step: 22330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:50:43,072-Speed 3069.11 samples/sec   Loss 14.2690   LearningRate 0.0828   Epoch: 1   Global Step: 22340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:50:46,434-Speed 3047.05 samples/sec   Loss 14.4222   LearningRate 0.0828   Epoch: 1   Global Step: 22350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:50:49,711-Speed 3125.03 samples/sec   Loss 14.5019   LearningRate 0.0828   Epoch: 1   Global Step: 22360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:50:53,043-Speed 3074.78 samples/sec   Loss 14.3849   LearningRate 0.0828   Epoch: 1   Global Step: 22370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:50:56,317-Speed 3127.93 samples/sec   Loss 14.3525   LearningRate 0.0828   Epoch: 1   Global Step: 22380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:50:59,631-Speed 3090.72 samples/sec   Loss 14.1990   LearningRate 0.0828   Epoch: 1   Global Step: 22390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:02,964-Speed 3073.34 samples/sec   Loss 14.4575   LearningRate 0.0828   Epoch: 1   Global Step: 22400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:06,310-Speed 3061.70 samples/sec   Loss 14.5265   LearningRate 0.0828   Epoch: 1   Global Step: 22410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:09,634-Speed 3081.51 samples/sec   Loss 14.4693   LearningRate 0.0828   Epoch: 1   Global Step: 22420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:12,900-Speed 3136.17 samples/sec   Loss 14.3083   LearningRate 0.0828   Epoch: 1   Global Step: 22430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:16,316-Speed 2998.34 samples/sec   Loss 14.3355   LearningRate 0.0827   Epoch: 1   Global Step: 22440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:51:19,583-Speed 3134.94 samples/sec   Loss 14.3864   LearningRate 0.0827   Epoch: 1   Global Step: 22450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:51:22,899-Speed 3089.80 samples/sec   Loss 14.3227   LearningRate 0.0827   Epoch: 1   Global Step: 22460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:26,157-Speed 3143.01 samples/sec   Loss 14.3566   LearningRate 0.0827   Epoch: 1   Global Step: 22470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:29,450-Speed 3110.22 samples/sec   Loss 14.3563   LearningRate 0.0827   Epoch: 1   Global Step: 22480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:32,722-Speed 3131.10 samples/sec   Loss 14.4385   LearningRate 0.0827   Epoch: 1   Global Step: 22490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:36,029-Speed 3097.57 samples/sec   Loss 14.3251   LearningRate 0.0827   Epoch: 1   Global Step: 22500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:39,378-Speed 3058.42 samples/sec   Loss 14.2929   LearningRate 0.0827   Epoch: 1   Global Step: 22510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:42,666-Speed 3114.84 samples/sec   Loss 14.1498   LearningRate 0.0827   Epoch: 1   Global Step: 22520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:45,950-Speed 3118.63 samples/sec   Loss 14.2621   LearningRate 0.0827   Epoch: 1   Global Step: 22530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:49,278-Speed 3077.45 samples/sec   Loss 14.3252   LearningRate 0.0827   Epoch: 1   Global Step: 22540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:52,657-Speed 3031.93 samples/sec   Loss 14.2623   LearningRate 0.0827   Epoch: 1   Global Step: 22550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:51:55,993-Speed 3069.79 samples/sec   Loss 14.4426   LearningRate 0.0827   Epoch: 1   Global Step: 22560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:51:59,309-Speed 3089.41 samples/sec   Loss 14.2747   LearningRate 0.0827   Epoch: 1   Global Step: 22570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:02,595-Speed 3116.69 samples/sec   Loss 14.3179   LearningRate 0.0826   Epoch: 1   Global Step: 22580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:05,992-Speed 3015.88 samples/sec   Loss 14.4433   LearningRate 0.0826   Epoch: 1   Global Step: 22590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:09,298-Speed 3098.34 samples/sec   Loss 14.5233   LearningRate 0.0826   Epoch: 1   Global Step: 22600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:12,610-Speed 3092.41 samples/sec   Loss 14.5470   LearningRate 0.0826   Epoch: 1   Global Step: 22610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:15,916-Speed 3098.32 samples/sec   Loss 14.3838   LearningRate 0.0826   Epoch: 1   Global Step: 22620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:19,195-Speed 3122.97 samples/sec   Loss 14.4619   LearningRate 0.0826   Epoch: 1   Global Step: 22630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:22,525-Speed 3076.47 samples/sec   Loss 14.5168   LearningRate 0.0826   Epoch: 1   Global Step: 22640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:25,857-Speed 3074.11 samples/sec   Loss 14.3839   LearningRate 0.0826   Epoch: 1   Global Step: 22650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:29,168-Speed 3093.76 samples/sec   Loss 14.2067   LearningRate 0.0826   Epoch: 1   Global Step: 22660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:32,464-Speed 3108.18 samples/sec   Loss 14.4621   LearningRate 0.0826   Epoch: 1   Global Step: 22670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:35,760-Speed 3107.28 samples/sec   Loss 14.2409   LearningRate 0.0826   Epoch: 1   Global Step: 22680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:39,046-Speed 3118.03 samples/sec   Loss 14.2186   LearningRate 0.0826   Epoch: 1   Global Step: 22690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:42,344-Speed 3106.49 samples/sec   Loss 14.1672   LearningRate 0.0826   Epoch: 1   Global Step: 22700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:45,664-Speed 3084.93 samples/sec   Loss 14.3629   LearningRate 0.0826   Epoch: 1   Global Step: 22710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:48,997-Speed 3072.83 samples/sec   Loss 14.5362   LearningRate 0.0825   Epoch: 1   Global Step: 22720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:52,299-Speed 3102.19 samples/sec   Loss 14.3670   LearningRate 0.0825   Epoch: 1   Global Step: 22730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:55,619-Speed 3085.84 samples/sec   Loss 14.3941   LearningRate 0.0825   Epoch: 1   Global Step: 22740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:52:58,985-Speed 3042.71 samples/sec   Loss 14.1859   LearningRate 0.0825   Epoch: 1   Global Step: 22750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:02,337-Speed 3056.13 samples/sec   Loss 14.4485   LearningRate 0.0825   Epoch: 1   Global Step: 22760   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 03:53:05,601-Speed 3137.38 samples/sec   Loss 14.4314   LearningRate 0.0825   Epoch: 1   Global Step: 22770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:08,926-Speed 3081.11 samples/sec   Loss 14.4876   LearningRate 0.0825   Epoch: 1   Global Step: 22780   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:12,295-Speed 3040.25 samples/sec   Loss 14.2590   LearningRate 0.0825   Epoch: 1   Global Step: 22790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:15,640-Speed 3062.62 samples/sec   Loss 14.2038   LearningRate 0.0825   Epoch: 1   Global Step: 22800   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:18,923-Speed 3119.49 samples/sec   Loss 14.3631   LearningRate 0.0825   Epoch: 1   Global Step: 22810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:22,239-Speed 3089.22 samples/sec   Loss 14.2636   LearningRate 0.0825   Epoch: 1   Global Step: 22820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:25,571-Speed 3073.57 samples/sec   Loss 14.2379   LearningRate 0.0825   Epoch: 1   Global Step: 22830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:28,860-Speed 3114.05 samples/sec   Loss 14.2151   LearningRate 0.0825   Epoch: 1   Global Step: 22840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:32,121-Speed 3141.91 samples/sec   Loss 14.3020   LearningRate 0.0824   Epoch: 1   Global Step: 22850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:35,474-Speed 3053.98 samples/sec   Loss 14.2215   LearningRate 0.0824   Epoch: 1   Global Step: 22860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:38,728-Speed 3148.58 samples/sec   Loss 14.3355   LearningRate 0.0824   Epoch: 1   Global Step: 22870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:42,022-Speed 3109.69 samples/sec   Loss 14.2646   LearningRate 0.0824   Epoch: 1   Global Step: 22880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:45,334-Speed 3092.20 samples/sec   Loss 14.2900   LearningRate 0.0824   Epoch: 1   Global Step: 22890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:49,235-Speed 2625.57 samples/sec   Loss 14.2313   LearningRate 0.0824   Epoch: 1   Global Step: 22900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:52,513-Speed 3125.33 samples/sec   Loss 14.2814   LearningRate 0.0824   Epoch: 1   Global Step: 22910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:55,816-Speed 3101.07 samples/sec   Loss 14.2241   LearningRate 0.0824   Epoch: 1   Global Step: 22920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:53:59,156-Speed 3066.36 samples/sec   Loss 14.3420   LearningRate 0.0824   Epoch: 1   Global Step: 22930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:02,461-Speed 3099.48 samples/sec   Loss 14.2696   LearningRate 0.0824   Epoch: 1   Global Step: 22940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:05,836-Speed 3034.55 samples/sec   Loss 14.4210   LearningRate 0.0824   Epoch: 1   Global Step: 22950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:09,223-Speed 3024.94 samples/sec   Loss 14.2564   LearningRate 0.0824   Epoch: 1   Global Step: 22960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:12,543-Speed 3085.53 samples/sec   Loss 14.4287   LearningRate 0.0824   Epoch: 1   Global Step: 22970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:15,873-Speed 3075.68 samples/sec   Loss 14.2234   LearningRate 0.0824   Epoch: 1   Global Step: 22980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:19,213-Speed 3066.82 samples/sec   Loss 14.2941   LearningRate 0.0823   Epoch: 1   Global Step: 22990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:22,549-Speed 3070.90 samples/sec   Loss 14.4140   LearningRate 0.0823   Epoch: 1   Global Step: 23000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:25,877-Speed 3077.79 samples/sec   Loss 14.4207   LearningRate 0.0823   Epoch: 1   Global Step: 23010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:29,224-Speed 3060.10 samples/sec   Loss 14.5048   LearningRate 0.0823   Epoch: 1   Global Step: 23020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:32,516-Speed 3111.46 samples/sec   Loss 14.3636   LearningRate 0.0823   Epoch: 1   Global Step: 23030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:35,785-Speed 3133.07 samples/sec   Loss 14.2209   LearningRate 0.0823   Epoch: 1   Global Step: 23040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:39,081-Speed 3108.51 samples/sec   Loss 14.3992   LearningRate 0.0823   Epoch: 1   Global Step: 23050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:42,419-Speed 3068.29 samples/sec   Loss 14.2043   LearningRate 0.0823   Epoch: 1   Global Step: 23060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:45,698-Speed 3124.40 samples/sec   Loss 14.2409   LearningRate 0.0823   Epoch: 1   Global Step: 23070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:49,000-Speed 3102.17 samples/sec   Loss 14.1932   LearningRate 0.0823   Epoch: 1   Global Step: 23080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:52,311-Speed 3093.35 samples/sec   Loss 14.3510   LearningRate 0.0823   Epoch: 1   Global Step: 23090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:55,686-Speed 3034.60 samples/sec   Loss 14.4950   LearningRate 0.0823   Epoch: 1   Global Step: 23100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:54:58,955-Speed 3133.44 samples/sec   Loss 14.2994   LearningRate 0.0823   Epoch: 1   Global Step: 23110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:55:02,291-Speed 3070.80 samples/sec   Loss 14.2673   LearningRate 0.0823   Epoch: 1   Global Step: 23120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:55:05,654-Speed 3046.12 samples/sec   Loss 14.4528   LearningRate 0.0822   Epoch: 1   Global Step: 23130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:55:08,950-Speed 3106.79 samples/sec   Loss 14.3254   LearningRate 0.0822   Epoch: 1   Global Step: 23140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:55:15,736-Speed 1509.48 samples/sec   Loss 14.3521   LearningRate 0.0822   Epoch: 1   Global Step: 23150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:55:19,046-Speed 3094.23 samples/sec   Loss 14.3058   LearningRate 0.0822   Epoch: 1   Global Step: 23160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:55:22,385-Speed 3067.71 samples/sec   Loss 14.1786   LearningRate 0.0822   Epoch: 1   Global Step: 23170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:55:25,671-Speed 3118.10 samples/sec   Loss 14.3225   LearningRate 0.0822   Epoch: 1   Global Step: 23180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:55:29,007-Speed 3070.54 samples/sec   Loss 14.3567   LearningRate 0.0822   Epoch: 1   Global Step: 23190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:55:32,345-Speed 3068.35 samples/sec   Loss 14.3093   LearningRate 0.0822   Epoch: 1   Global Step: 23200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:55:35,699-Speed 3053.85 samples/sec   Loss 14.2388   LearningRate 0.0822   Epoch: 1   Global Step: 23210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:55:39,084-Speed 3026.62 samples/sec   Loss 14.4045   LearningRate 0.0822   Epoch: 1   Global Step: 23220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:55:42,479-Speed 3016.54 samples/sec   Loss 14.3020   LearningRate 0.0822   Epoch: 1   Global Step: 23230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:55:45,796-Speed 3087.87 samples/sec   Loss 14.2691   LearningRate 0.0822   Epoch: 1   Global Step: 23240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:55:49,137-Speed 3066.61 samples/sec   Loss 14.3569   LearningRate 0.0822   Epoch: 1   Global Step: 23250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:55:52,442-Speed 3098.89 samples/sec   Loss 14.0817   LearningRate 0.0822   Epoch: 1   Global Step: 23260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:55:55,708-Speed 3135.92 samples/sec   Loss 14.2021   LearningRate 0.0821   Epoch: 1   Global Step: 23270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:55:59,040-Speed 3074.65 samples/sec   Loss 14.1429   LearningRate 0.0821   Epoch: 1   Global Step: 23280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:56:02,389-Speed 3058.44 samples/sec   Loss 14.3212   LearningRate 0.0821   Epoch: 1   Global Step: 23290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:56:05,711-Speed 3083.38 samples/sec   Loss 14.3197   LearningRate 0.0821   Epoch: 1   Global Step: 23300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:56:09,041-Speed 3075.82 samples/sec   Loss 14.2269   LearningRate 0.0821   Epoch: 1   Global Step: 23310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:56:12,361-Speed 3085.25 samples/sec   Loss 14.2044   LearningRate 0.0821   Epoch: 1   Global Step: 23320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:56:15,613-Speed 3149.41 samples/sec   Loss 14.3074   LearningRate 0.0821   Epoch: 1   Global Step: 23330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:56:18,941-Speed 3078.50 samples/sec   Loss 14.1392   LearningRate 0.0821   Epoch: 1   Global Step: 23340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:56:22,281-Speed 3066.03 samples/sec   Loss 14.2299   LearningRate 0.0821   Epoch: 1   Global Step: 23350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:56:25,628-Speed 3060.57 samples/sec   Loss 14.1074   LearningRate 0.0821   Epoch: 1   Global Step: 23360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:56:28,969-Speed 3066.14 samples/sec   Loss 14.4764   LearningRate 0.0821   Epoch: 1   Global Step: 23370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:56:32,291-Speed 3084.32 samples/sec   Loss 14.3077   LearningRate 0.0821   Epoch: 1   Global Step: 23380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:56:35,610-Speed 3085.35 samples/sec   Loss 14.2924   LearningRate 0.0821   Epoch: 1   Global Step: 23390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:56:38,962-Speed 3056.03 samples/sec   Loss 14.3064   LearningRate 0.0820   Epoch: 1   Global Step: 23400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:56:42,250-Speed 3115.25 samples/sec   Loss 14.2357   LearningRate 0.0820   Epoch: 1   Global Step: 23410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:56:45,589-Speed 3068.22 samples/sec   Loss 14.1190   LearningRate 0.0820   Epoch: 1   Global Step: 23420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:56:48,943-Speed 3053.68 samples/sec   Loss 14.2171   LearningRate 0.0820   Epoch: 1   Global Step: 23430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:56:52,347-Speed 3010.25 samples/sec   Loss 14.1840   LearningRate 0.0820   Epoch: 1   Global Step: 23440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:56:55,683-Speed 3070.51 samples/sec   Loss 14.1915   LearningRate 0.0820   Epoch: 1   Global Step: 23450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:56:59,023-Speed 3067.75 samples/sec   Loss 14.2344   LearningRate 0.0820   Epoch: 1   Global Step: 23460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:57:02,376-Speed 3054.74 samples/sec   Loss 14.1686   LearningRate 0.0820   Epoch: 1   Global Step: 23470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:57:05,729-Speed 3054.87 samples/sec   Loss 14.0988   LearningRate 0.0820   Epoch: 1   Global Step: 23480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:57:09,100-Speed 3038.58 samples/sec   Loss 14.4230   LearningRate 0.0820   Epoch: 1   Global Step: 23490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:57:12,413-Speed 3092.00 samples/sec   Loss 14.0939   LearningRate 0.0820   Epoch: 1   Global Step: 23500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:57:15,758-Speed 3062.55 samples/sec   Loss 14.2228   LearningRate 0.0820   Epoch: 1   Global Step: 23510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:57:19,074-Speed 3089.59 samples/sec   Loss 14.2312   LearningRate 0.0820   Epoch: 1   Global Step: 23520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:57:22,427-Speed 3054.21 samples/sec   Loss 14.2141   LearningRate 0.0820   Epoch: 1   Global Step: 23530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:57:25,713-Speed 3116.98 samples/sec   Loss 14.3136   LearningRate 0.0819   Epoch: 1   Global Step: 23540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:57:29,105-Speed 3020.06 samples/sec   Loss 14.3235   LearningRate 0.0819   Epoch: 1   Global Step: 23550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:57:32,469-Speed 3044.90 samples/sec   Loss 14.0965   LearningRate 0.0819   Epoch: 1   Global Step: 23560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:57:35,787-Speed 3086.80 samples/sec   Loss 14.3022   LearningRate 0.0819   Epoch: 1   Global Step: 23570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:57:39,106-Speed 3086.42 samples/sec   Loss 14.1621   LearningRate 0.0819   Epoch: 1   Global Step: 23580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:57:42,399-Speed 3110.30 samples/sec   Loss 14.2540   LearningRate 0.0819   Epoch: 1   Global Step: 23590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:57:45,697-Speed 3106.50 samples/sec   Loss 14.3664   LearningRate 0.0819   Epoch: 1   Global Step: 23600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:57:48,968-Speed 3131.09 samples/sec   Loss 14.4002   LearningRate 0.0819   Epoch: 1   Global Step: 23610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:57:52,316-Speed 3059.45 samples/sec   Loss 14.2801   LearningRate 0.0819   Epoch: 1   Global Step: 23620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:57:55,663-Speed 3060.27 samples/sec   Loss 14.1632   LearningRate 0.0819   Epoch: 1   Global Step: 23630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:57:59,019-Speed 3052.09 samples/sec   Loss 14.1775   LearningRate 0.0819   Epoch: 1   Global Step: 23640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:58:02,310-Speed 3112.86 samples/sec   Loss 14.3684   LearningRate 0.0819   Epoch: 1   Global Step: 23650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:58:05,634-Speed 3081.11 samples/sec   Loss 14.3534   LearningRate 0.0819   Epoch: 1   Global Step: 23660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:58:08,922-Speed 3115.55 samples/sec   Loss 14.1516   LearningRate 0.0819   Epoch: 1   Global Step: 23670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:58:12,229-Speed 3097.72 samples/sec   Loss 14.1133   LearningRate 0.0818   Epoch: 1   Global Step: 23680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:58:15,517-Speed 3115.62 samples/sec   Loss 14.0734   LearningRate 0.0818   Epoch: 1   Global Step: 23690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:58:18,829-Speed 3092.42 samples/sec   Loss 14.1004   LearningRate 0.0818   Epoch: 1   Global Step: 23700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:58:22,132-Speed 3101.33 samples/sec   Loss 14.3764   LearningRate 0.0818   Epoch: 1   Global Step: 23710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:58:25,506-Speed 3035.54 samples/sec   Loss 14.1769   LearningRate 0.0818   Epoch: 1   Global Step: 23720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:58:28,826-Speed 3085.34 samples/sec   Loss 14.1682   LearningRate 0.0818   Epoch: 1   Global Step: 23730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:58:32,120-Speed 3109.59 samples/sec   Loss 14.2497   LearningRate 0.0818   Epoch: 1   Global Step: 23740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:58:35,430-Speed 3094.31 samples/sec   Loss 14.2881   LearningRate 0.0818   Epoch: 1   Global Step: 23750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:58:38,758-Speed 3077.92 samples/sec   Loss 14.3298   LearningRate 0.0818   Epoch: 1   Global Step: 23760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:58:42,058-Speed 3104.82 samples/sec   Loss 14.2256   LearningRate 0.0818   Epoch: 1   Global Step: 23770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:58:45,474-Speed 2998.23 samples/sec   Loss 14.2370   LearningRate 0.0818   Epoch: 1   Global Step: 23780   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 03:58:48,845-Speed 3038.88 samples/sec   Loss 14.3913   LearningRate 0.0818   Epoch: 1   Global Step: 23790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:58:52,192-Speed 3060.31 samples/sec   Loss 14.1982   LearningRate 0.0818   Epoch: 1   Global Step: 23800   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:58:55,535-Speed 3063.99 samples/sec   Loss 14.1415   LearningRate 0.0817   Epoch: 1   Global Step: 23810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:58:58,910-Speed 3035.66 samples/sec   Loss 14.3207   LearningRate 0.0817   Epoch: 1   Global Step: 23820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:59:02,308-Speed 3013.76 samples/sec   Loss 14.0917   LearningRate 0.0817   Epoch: 1   Global Step: 23830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:59:05,599-Speed 3112.83 samples/sec   Loss 14.1513   LearningRate 0.0817   Epoch: 1   Global Step: 23840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:59:08,925-Speed 3079.58 samples/sec   Loss 14.2206   LearningRate 0.0817   Epoch: 1   Global Step: 23850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:59:12,258-Speed 3072.99 samples/sec   Loss 14.2132   LearningRate 0.0817   Epoch: 1   Global Step: 23860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:59:15,573-Speed 3089.94 samples/sec   Loss 14.2393   LearningRate 0.0817   Epoch: 1   Global Step: 23870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:59:18,877-Speed 3100.51 samples/sec   Loss 14.1913   LearningRate 0.0817   Epoch: 1   Global Step: 23880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:59:22,125-Speed 3154.02 samples/sec   Loss 14.0365   LearningRate 0.0817   Epoch: 1   Global Step: 23890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:59:25,436-Speed 3093.70 samples/sec   Loss 14.1937   LearningRate 0.0817   Epoch: 1   Global Step: 23900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:59:28,705-Speed 3132.88 samples/sec   Loss 14.2114   LearningRate 0.0817   Epoch: 1   Global Step: 23910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 03:59:32,018-Speed 3092.56 samples/sec   Loss 14.3223   LearningRate 0.0817   Epoch: 1   Global Step: 23920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:59:35,342-Speed 3081.62 samples/sec   Loss 14.2975   LearningRate 0.0817   Epoch: 1   Global Step: 23930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:59:38,705-Speed 3045.67 samples/sec   Loss 14.0578   LearningRate 0.0817   Epoch: 1   Global Step: 23940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:59:42,081-Speed 3034.02 samples/sec   Loss 14.1143   LearningRate 0.0816   Epoch: 1   Global Step: 23950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:59:45,409-Speed 3078.04 samples/sec   Loss 14.1265   LearningRate 0.0816   Epoch: 1   Global Step: 23960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:59:48,716-Speed 3097.48 samples/sec   Loss 14.2224   LearningRate 0.0816   Epoch: 1   Global Step: 23970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:59:51,997-Speed 3121.62 samples/sec   Loss 14.1548   LearningRate 0.0816   Epoch: 1   Global Step: 23980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:59:55,318-Speed 3084.62 samples/sec   Loss 14.1225   LearningRate 0.0816   Epoch: 1   Global Step: 23990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 03:59:58,680-Speed 3046.79 samples/sec   Loss 14.1173   LearningRate 0.0816   Epoch: 1   Global Step: 24000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:00:02,023-Speed 3064.67 samples/sec   Loss 14.2402   LearningRate 0.0816   Epoch: 1   Global Step: 24010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:00:05,308-Speed 3117.21 samples/sec   Loss 14.2400   LearningRate 0.0816   Epoch: 1   Global Step: 24020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:00:08,642-Speed 3072.53 samples/sec   Loss 13.9743   LearningRate 0.0816   Epoch: 1   Global Step: 24030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:00:11,959-Speed 3088.17 samples/sec   Loss 14.2410   LearningRate 0.0816   Epoch: 1   Global Step: 24040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:00:15,299-Speed 3066.74 samples/sec   Loss 14.1693   LearningRate 0.0816   Epoch: 1   Global Step: 24050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:00:18,619-Speed 3085.81 samples/sec   Loss 14.2213   LearningRate 0.0816   Epoch: 1   Global Step: 24060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:00:21,906-Speed 3116.79 samples/sec   Loss 14.2529   LearningRate 0.0816   Epoch: 1   Global Step: 24070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:00:25,289-Speed 3027.33 samples/sec   Loss 14.0483   LearningRate 0.0816   Epoch: 1   Global Step: 24080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:00:28,578-Speed 3115.05 samples/sec   Loss 14.1187   LearningRate 0.0815   Epoch: 1   Global Step: 24090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:00:31,879-Speed 3102.35 samples/sec   Loss 14.1120   LearningRate 0.0815   Epoch: 1   Global Step: 24100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:00:35,140-Speed 3141.99 samples/sec   Loss 14.1148   LearningRate 0.0815   Epoch: 1   Global Step: 24110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:00:38,449-Speed 3095.26 samples/sec   Loss 14.0641   LearningRate 0.0815   Epoch: 1   Global Step: 24120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:00:41,700-Speed 3150.78 samples/sec   Loss 14.0759   LearningRate 0.0815   Epoch: 1   Global Step: 24130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:00:44,982-Speed 3120.64 samples/sec   Loss 14.1615   LearningRate 0.0815   Epoch: 1   Global Step: 24140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:00:48,299-Speed 3088.73 samples/sec   Loss 14.1997   LearningRate 0.0815   Epoch: 1   Global Step: 24150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:00:51,713-Speed 2999.87 samples/sec   Loss 14.2136   LearningRate 0.0815   Epoch: 1   Global Step: 24160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:00:55,004-Speed 3113.08 samples/sec   Loss 14.1837   LearningRate 0.0815   Epoch: 1   Global Step: 24170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:00:58,328-Speed 3081.43 samples/sec   Loss 14.2340   LearningRate 0.0815   Epoch: 1   Global Step: 24180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:01,599-Speed 3131.12 samples/sec   Loss 14.1801   LearningRate 0.0815   Epoch: 1   Global Step: 24190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:04,906-Speed 3097.63 samples/sec   Loss 14.2831   LearningRate 0.0815   Epoch: 1   Global Step: 24200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:08,178-Speed 3130.71 samples/sec   Loss 14.2598   LearningRate 0.0815   Epoch: 1   Global Step: 24210   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 04:01:11,521-Speed 3063.17 samples/sec   Loss 14.1664   LearningRate 0.0815   Epoch: 1   Global Step: 24220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:14,869-Speed 3060.02 samples/sec   Loss 14.2294   LearningRate 0.0814   Epoch: 1   Global Step: 24230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:18,199-Speed 3076.04 samples/sec   Loss 14.2449   LearningRate 0.0814   Epoch: 1   Global Step: 24240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:21,533-Speed 3072.13 samples/sec   Loss 14.0886   LearningRate 0.0814   Epoch: 1   Global Step: 24250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:24,880-Speed 3060.20 samples/sec   Loss 14.2430   LearningRate 0.0814   Epoch: 1   Global Step: 24260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:28,177-Speed 3106.28 samples/sec   Loss 14.0722   LearningRate 0.0814   Epoch: 1   Global Step: 24270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:31,440-Speed 3140.12 samples/sec   Loss 14.0344   LearningRate 0.0814   Epoch: 1   Global Step: 24280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:34,697-Speed 3144.38 samples/sec   Loss 14.2088   LearningRate 0.0814   Epoch: 1   Global Step: 24290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:38,017-Speed 3084.95 samples/sec   Loss 14.1396   LearningRate 0.0814   Epoch: 1   Global Step: 24300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:41,283-Speed 3136.72 samples/sec   Loss 14.1472   LearningRate 0.0814   Epoch: 1   Global Step: 24310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:44,540-Speed 3145.59 samples/sec   Loss 14.2248   LearningRate 0.0814   Epoch: 1   Global Step: 24320   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 04:01:47,817-Speed 3125.61 samples/sec   Loss 14.1106   LearningRate 0.0814   Epoch: 1   Global Step: 24330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:51,102-Speed 3118.09 samples/sec   Loss 14.0878   LearningRate 0.0814   Epoch: 1   Global Step: 24340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:54,403-Speed 3103.37 samples/sec   Loss 14.2967   LearningRate 0.0814   Epoch: 1   Global Step: 24350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:01:57,742-Speed 3067.80 samples/sec   Loss 14.1822   LearningRate 0.0813   Epoch: 1   Global Step: 24360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:01,052-Speed 3094.64 samples/sec   Loss 14.1394   LearningRate 0.0813   Epoch: 1   Global Step: 24370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:04,411-Speed 3049.26 samples/sec   Loss 14.1319   LearningRate 0.0813   Epoch: 1   Global Step: 24380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:07,762-Speed 3056.72 samples/sec   Loss 14.1103   LearningRate 0.0813   Epoch: 1   Global Step: 24390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:11,043-Speed 3121.63 samples/sec   Loss 14.1855   LearningRate 0.0813   Epoch: 1   Global Step: 24400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:14,328-Speed 3118.40 samples/sec   Loss 14.2588   LearningRate 0.0813   Epoch: 1   Global Step: 24410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:17,697-Speed 3040.03 samples/sec   Loss 13.9730   LearningRate 0.0813   Epoch: 1   Global Step: 24420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:21,020-Speed 3082.48 samples/sec   Loss 14.1981   LearningRate 0.0813   Epoch: 1   Global Step: 24430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:24,376-Speed 3052.23 samples/sec   Loss 14.2257   LearningRate 0.0813   Epoch: 1   Global Step: 24440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:27,729-Speed 3054.88 samples/sec   Loss 14.1873   LearningRate 0.0813   Epoch: 1   Global Step: 24450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:31,032-Speed 3100.91 samples/sec   Loss 14.0762   LearningRate 0.0813   Epoch: 1   Global Step: 24460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:34,305-Speed 3129.34 samples/sec   Loss 14.1661   LearningRate 0.0813   Epoch: 1   Global Step: 24470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:37,634-Speed 3077.39 samples/sec   Loss 13.9165   LearningRate 0.0813   Epoch: 1   Global Step: 24480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:40,994-Speed 3048.52 samples/sec   Loss 14.1839   LearningRate 0.0813   Epoch: 1   Global Step: 24490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:44,337-Speed 3064.65 samples/sec   Loss 14.0174   LearningRate 0.0812   Epoch: 1   Global Step: 24500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:47,685-Speed 3059.72 samples/sec   Loss 14.1252   LearningRate 0.0812   Epoch: 1   Global Step: 24510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:50,986-Speed 3102.92 samples/sec   Loss 14.1293   LearningRate 0.0812   Epoch: 1   Global Step: 24520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:54,246-Speed 3142.59 samples/sec   Loss 14.0211   LearningRate 0.0812   Epoch: 1   Global Step: 24530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:02:57,550-Speed 3100.16 samples/sec   Loss 14.1293   LearningRate 0.0812   Epoch: 1   Global Step: 24540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:00,836-Speed 3117.87 samples/sec   Loss 14.1670   LearningRate 0.0812   Epoch: 1   Global Step: 24550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:04,114-Speed 3124.28 samples/sec   Loss 14.1846   LearningRate 0.0812   Epoch: 1   Global Step: 24560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:07,433-Speed 3086.63 samples/sec   Loss 14.2569   LearningRate 0.0812   Epoch: 1   Global Step: 24570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:10,731-Speed 3105.18 samples/sec   Loss 14.1557   LearningRate 0.0812   Epoch: 1   Global Step: 24580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:14,048-Speed 3088.67 samples/sec   Loss 13.9939   LearningRate 0.0812   Epoch: 1   Global Step: 24590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:17,413-Speed 3044.36 samples/sec   Loss 14.0813   LearningRate 0.0812   Epoch: 1   Global Step: 24600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:20,754-Speed 3065.36 samples/sec   Loss 14.1745   LearningRate 0.0812   Epoch: 1   Global Step: 24610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:24,104-Speed 3057.58 samples/sec   Loss 14.2517   LearningRate 0.0812   Epoch: 1   Global Step: 24620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:27,417-Speed 3092.49 samples/sec   Loss 14.0991   LearningRate 0.0812   Epoch: 1   Global Step: 24630   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 04:03:30,767-Speed 3057.17 samples/sec   Loss 14.0544   LearningRate 0.0811   Epoch: 1   Global Step: 24640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:34,092-Speed 3080.97 samples/sec   Loss 14.0559   LearningRate 0.0811   Epoch: 1   Global Step: 24650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:37,425-Speed 3072.73 samples/sec   Loss 14.2144   LearningRate 0.0811   Epoch: 1   Global Step: 24660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:40,789-Speed 3044.91 samples/sec   Loss 14.0039   LearningRate 0.0811   Epoch: 1   Global Step: 24670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:44,052-Speed 3139.64 samples/sec   Loss 14.1149   LearningRate 0.0811   Epoch: 1   Global Step: 24680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:47,402-Speed 3057.60 samples/sec   Loss 14.1923   LearningRate 0.0811   Epoch: 1   Global Step: 24690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:50,701-Speed 3104.91 samples/sec   Loss 14.0607   LearningRate 0.0811   Epoch: 1   Global Step: 24700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:53,975-Speed 3128.97 samples/sec   Loss 14.0129   LearningRate 0.0811   Epoch: 1   Global Step: 24710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:03:57,299-Speed 3082.32 samples/sec   Loss 14.0692   LearningRate 0.0811   Epoch: 1   Global Step: 24720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:04:00,614-Speed 3089.37 samples/sec   Loss 14.2281   LearningRate 0.0811   Epoch: 1   Global Step: 24730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:04:03,906-Speed 3112.49 samples/sec   Loss 14.1715   LearningRate 0.0811   Epoch: 1   Global Step: 24740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:04:07,238-Speed 3073.74 samples/sec   Loss 14.1549   LearningRate 0.0811   Epoch: 1   Global Step: 24750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:04:10,575-Speed 3070.54 samples/sec   Loss 14.0938   LearningRate 0.0811   Epoch: 1   Global Step: 24760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:04:13,840-Speed 3136.73 samples/sec   Loss 14.0692   LearningRate 0.0811   Epoch: 1   Global Step: 24770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:04:17,194-Speed 3054.38 samples/sec   Loss 13.9665   LearningRate 0.0810   Epoch: 1   Global Step: 24780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:04:20,510-Speed 3089.27 samples/sec   Loss 13.9075   LearningRate 0.0810   Epoch: 1   Global Step: 24790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:04:23,861-Speed 3056.28 samples/sec   Loss 14.0670   LearningRate 0.0810   Epoch: 1   Global Step: 24800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:04:27,228-Speed 3042.75 samples/sec   Loss 14.0135   LearningRate 0.0810   Epoch: 1   Global Step: 24810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:04:30,593-Speed 3044.26 samples/sec   Loss 14.1973   LearningRate 0.0810   Epoch: 1   Global Step: 24820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:04:33,878-Speed 3117.91 samples/sec   Loss 13.8699   LearningRate 0.0810   Epoch: 1   Global Step: 24830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:04:37,501-Speed 2828.12 samples/sec   Loss 13.9247   LearningRate 0.0810   Epoch: 1   Global Step: 24840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:05:09,125-Speed 323.82 samples/sec   Loss 12.8697   LearningRate 0.0810   Epoch: 2   Global Step: 24850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:05:12,548-Speed 2993.06 samples/sec   Loss 12.5972   LearningRate 0.0810   Epoch: 2   Global Step: 24860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:05:15,824-Speed 3126.15 samples/sec   Loss 12.4119   LearningRate 0.0810   Epoch: 2   Global Step: 24870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:05:19,184-Speed 3048.68 samples/sec   Loss 12.4678   LearningRate 0.0810   Epoch: 2   Global Step: 24880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:05:22,457-Speed 3130.19 samples/sec   Loss 12.5394   LearningRate 0.0810   Epoch: 2   Global Step: 24890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:05:25,754-Speed 3108.35 samples/sec   Loss 12.5143   LearningRate 0.0810   Epoch: 2   Global Step: 24900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:05:29,050-Speed 3107.46 samples/sec   Loss 12.6044   LearningRate 0.0810   Epoch: 2   Global Step: 24910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:05:32,339-Speed 3114.05 samples/sec   Loss 12.4336   LearningRate 0.0809   Epoch: 2   Global Step: 24920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:05:35,685-Speed 3061.87 samples/sec   Loss 12.3095   LearningRate 0.0809   Epoch: 2   Global Step: 24930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:05:39,019-Speed 3072.31 samples/sec   Loss 12.5376   LearningRate 0.0809   Epoch: 2   Global Step: 24940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:05:42,328-Speed 3095.24 samples/sec   Loss 12.6920   LearningRate 0.0809   Epoch: 2   Global Step: 24950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:05:45,668-Speed 3066.92 samples/sec   Loss 12.5537   LearningRate 0.0809   Epoch: 2   Global Step: 24960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:05:49,053-Speed 3026.00 samples/sec   Loss 12.4234   LearningRate 0.0809   Epoch: 2   Global Step: 24970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:05:52,322-Speed 3133.80 samples/sec   Loss 12.5275   LearningRate 0.0809   Epoch: 2   Global Step: 24980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:05:55,622-Speed 3103.94 samples/sec   Loss 12.5730   LearningRate 0.0809   Epoch: 2   Global Step: 24990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:05:58,950-Speed 3077.93 samples/sec   Loss 12.6667   LearningRate 0.0809   Epoch: 2   Global Step: 25000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:06:02,334-Speed 3026.35 samples/sec   Loss 12.6471   LearningRate 0.0809   Epoch: 2   Global Step: 25010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:05,690-Speed 3052.74 samples/sec   Loss 12.6139   LearningRate 0.0809   Epoch: 2   Global Step: 25020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:08,949-Speed 3142.43 samples/sec   Loss 12.6491   LearningRate 0.0809   Epoch: 2   Global Step: 25030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:12,236-Speed 3116.71 samples/sec   Loss 12.7140   LearningRate 0.0809   Epoch: 2   Global Step: 25040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:15,553-Speed 3087.42 samples/sec   Loss 12.5319   LearningRate 0.0808   Epoch: 2   Global Step: 25050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:18,812-Speed 3143.61 samples/sec   Loss 12.6874   LearningRate 0.0808   Epoch: 2   Global Step: 25060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:22,119-Speed 3097.47 samples/sec   Loss 12.8795   LearningRate 0.0808   Epoch: 2   Global Step: 25070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:25,407-Speed 3115.60 samples/sec   Loss 12.6344   LearningRate 0.0808   Epoch: 2   Global Step: 25080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:28,836-Speed 2987.31 samples/sec   Loss 12.6583   LearningRate 0.0808   Epoch: 2   Global Step: 25090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:32,097-Speed 3140.89 samples/sec   Loss 12.6715   LearningRate 0.0808   Epoch: 2   Global Step: 25100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:35,373-Speed 3126.15 samples/sec   Loss 12.7642   LearningRate 0.0808   Epoch: 2   Global Step: 25110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:38,694-Speed 3085.04 samples/sec   Loss 12.7345   LearningRate 0.0808   Epoch: 2   Global Step: 25120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:41,975-Speed 3121.11 samples/sec   Loss 12.6872   LearningRate 0.0808   Epoch: 2   Global Step: 25130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:45,301-Speed 3080.38 samples/sec   Loss 12.8515   LearningRate 0.0808   Epoch: 2   Global Step: 25140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:48,624-Speed 3082.68 samples/sec   Loss 12.7765   LearningRate 0.0808   Epoch: 2   Global Step: 25150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:51,886-Speed 3139.61 samples/sec   Loss 12.6247   LearningRate 0.0808   Epoch: 2   Global Step: 25160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:55,147-Speed 3141.51 samples/sec   Loss 12.7436   LearningRate 0.0808   Epoch: 2   Global Step: 25170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:06:58,463-Speed 3089.44 samples/sec   Loss 12.6856   LearningRate 0.0808   Epoch: 2   Global Step: 25180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:07:01,783-Speed 3084.91 samples/sec   Loss 12.7939   LearningRate 0.0807   Epoch: 2   Global Step: 25190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:07:05,048-Speed 3137.18 samples/sec   Loss 12.8209   LearningRate 0.0807   Epoch: 2   Global Step: 25200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:07:08,354-Speed 3099.07 samples/sec   Loss 12.8890   LearningRate 0.0807   Epoch: 2   Global Step: 25210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:07:11,720-Speed 3042.66 samples/sec   Loss 12.9401   LearningRate 0.0807   Epoch: 2   Global Step: 25220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:07:15,094-Speed 3036.15 samples/sec   Loss 12.7394   LearningRate 0.0807   Epoch: 2   Global Step: 25230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:07:18,393-Speed 3104.76 samples/sec   Loss 12.8329   LearningRate 0.0807   Epoch: 2   Global Step: 25240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:07:21,675-Speed 3121.41 samples/sec   Loss 12.8991   LearningRate 0.0807   Epoch: 2   Global Step: 25250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:07:24,969-Speed 3110.24 samples/sec   Loss 12.8395   LearningRate 0.0807   Epoch: 2   Global Step: 25260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:07:28,262-Speed 3110.55 samples/sec   Loss 12.9104   LearningRate 0.0807   Epoch: 2   Global Step: 25270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:07:31,625-Speed 3045.84 samples/sec   Loss 12.9829   LearningRate 0.0807   Epoch: 2   Global Step: 25280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:07:34,957-Speed 3073.96 samples/sec   Loss 12.8341   LearningRate 0.0807   Epoch: 2   Global Step: 25290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:07:38,349-Speed 3019.81 samples/sec   Loss 12.7605   LearningRate 0.0807   Epoch: 2   Global Step: 25300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:07:41,666-Speed 3087.66 samples/sec   Loss 12.9570   LearningRate 0.0807   Epoch: 2   Global Step: 25310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:07:44,961-Speed 3109.06 samples/sec   Loss 12.9527   LearningRate 0.0807   Epoch: 2   Global Step: 25320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:07:48,246-Speed 3118.62 samples/sec   Loss 12.9605   LearningRate 0.0806   Epoch: 2   Global Step: 25330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:07:51,607-Speed 3047.32 samples/sec   Loss 12.9940   LearningRate 0.0806   Epoch: 2   Global Step: 25340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:07:54,985-Speed 3031.91 samples/sec   Loss 12.8001   LearningRate 0.0806   Epoch: 2   Global Step: 25350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:07:58,343-Speed 3050.66 samples/sec   Loss 13.1202   LearningRate 0.0806   Epoch: 2   Global Step: 25360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:08:01,614-Speed 3131.81 samples/sec   Loss 13.0941   LearningRate 0.0806   Epoch: 2   Global Step: 25370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:08:04,880-Speed 3135.28 samples/sec   Loss 12.9728   LearningRate 0.0806   Epoch: 2   Global Step: 25380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:08:08,177-Speed 3106.95 samples/sec   Loss 13.0060   LearningRate 0.0806   Epoch: 2   Global Step: 25390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:08:11,491-Speed 3091.24 samples/sec   Loss 13.0978   LearningRate 0.0806   Epoch: 2   Global Step: 25400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:08:14,769-Speed 3125.22 samples/sec   Loss 12.9839   LearningRate 0.0806   Epoch: 2   Global Step: 25410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:08:18,102-Speed 3073.00 samples/sec   Loss 12.9628   LearningRate 0.0806   Epoch: 2   Global Step: 25420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:08:21,487-Speed 3026.16 samples/sec   Loss 13.1029   LearningRate 0.0806   Epoch: 2   Global Step: 25430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:08:24,796-Speed 3095.19 samples/sec   Loss 13.0235   LearningRate 0.0806   Epoch: 2   Global Step: 25440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:08:28,145-Speed 3058.96 samples/sec   Loss 13.0573   LearningRate 0.0806   Epoch: 2   Global Step: 25450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:08:31,461-Speed 3089.05 samples/sec   Loss 13.0414   LearningRate 0.0806   Epoch: 2   Global Step: 25460   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 04:08:34,756-Speed 3109.56 samples/sec   Loss 13.0519   LearningRate 0.0805   Epoch: 2   Global Step: 25470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:08:38,052-Speed 3107.46 samples/sec   Loss 13.1016   LearningRate 0.0805   Epoch: 2   Global Step: 25480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:08:41,353-Speed 3103.43 samples/sec   Loss 13.2219   LearningRate 0.0805   Epoch: 2   Global Step: 25490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:08:44,725-Speed 3037.10 samples/sec   Loss 12.9425   LearningRate 0.0805   Epoch: 2   Global Step: 25500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:08:48,056-Speed 3075.13 samples/sec   Loss 13.0005   LearningRate 0.0805   Epoch: 2   Global Step: 25510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:08:51,393-Speed 3070.05 samples/sec   Loss 13.0378   LearningRate 0.0805   Epoch: 2   Global Step: 25520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:08:54,722-Speed 3077.28 samples/sec   Loss 13.0149   LearningRate 0.0805   Epoch: 2   Global Step: 25530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:08:58,060-Speed 3067.88 samples/sec   Loss 13.0576   LearningRate 0.0805   Epoch: 2   Global Step: 25540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:09:01,379-Speed 3086.58 samples/sec   Loss 13.2541   LearningRate 0.0805   Epoch: 2   Global Step: 25550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:09:04,686-Speed 3097.30 samples/sec   Loss 13.1641   LearningRate 0.0805   Epoch: 2   Global Step: 25560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:09:08,104-Speed 2997.33 samples/sec   Loss 13.1687   LearningRate 0.0805   Epoch: 2   Global Step: 25570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:09:11,405-Speed 3103.20 samples/sec   Loss 13.2196   LearningRate 0.0805   Epoch: 2   Global Step: 25580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:09:14,755-Speed 3056.73 samples/sec   Loss 13.1589   LearningRate 0.0805   Epoch: 2   Global Step: 25590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:09:18,025-Speed 3133.30 samples/sec   Loss 13.1455   LearningRate 0.0805   Epoch: 2   Global Step: 25600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:09:21,299-Speed 3128.89 samples/sec   Loss 13.1844   LearningRate 0.0804   Epoch: 2   Global Step: 25610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:09:24,637-Speed 3068.89 samples/sec   Loss 13.1072   LearningRate 0.0804   Epoch: 2   Global Step: 25620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:09:27,945-Speed 3095.91 samples/sec   Loss 13.1669   LearningRate 0.0804   Epoch: 2   Global Step: 25630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:09:31,241-Speed 3108.30 samples/sec   Loss 13.2314   LearningRate 0.0804   Epoch: 2   Global Step: 25640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:09:34,558-Speed 3088.02 samples/sec   Loss 13.2228   LearningRate 0.0804   Epoch: 2   Global Step: 25650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:09:37,850-Speed 3110.96 samples/sec   Loss 13.3326   LearningRate 0.0804   Epoch: 2   Global Step: 25660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:09:41,195-Speed 3061.91 samples/sec   Loss 13.2058   LearningRate 0.0804   Epoch: 2   Global Step: 25670   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 04:09:44,499-Speed 3100.42 samples/sec   Loss 13.1520   LearningRate 0.0804   Epoch: 2   Global Step: 25680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:09:47,868-Speed 3040.10 samples/sec   Loss 13.1829   LearningRate 0.0804   Epoch: 2   Global Step: 25690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:09:51,172-Speed 3100.19 samples/sec   Loss 13.1797   LearningRate 0.0804   Epoch: 2   Global Step: 25700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:09:54,486-Speed 3091.52 samples/sec   Loss 13.2022   LearningRate 0.0804   Epoch: 2   Global Step: 25710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:09:57,740-Speed 3147.37 samples/sec   Loss 13.1975   LearningRate 0.0804   Epoch: 2   Global Step: 25720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:01,037-Speed 3106.54 samples/sec   Loss 13.2883   LearningRate 0.0804   Epoch: 2   Global Step: 25730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:04,409-Speed 3037.97 samples/sec   Loss 13.2799   LearningRate 0.0804   Epoch: 2   Global Step: 25740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:07,706-Speed 3107.35 samples/sec   Loss 13.3904   LearningRate 0.0803   Epoch: 2   Global Step: 25750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:11,024-Speed 3086.95 samples/sec   Loss 13.1118   LearningRate 0.0803   Epoch: 2   Global Step: 25760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:14,365-Speed 3066.30 samples/sec   Loss 13.2502   LearningRate 0.0803   Epoch: 2   Global Step: 25770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:17,650-Speed 3117.65 samples/sec   Loss 13.2440   LearningRate 0.0803   Epoch: 2   Global Step: 25780   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:20,942-Speed 3112.23 samples/sec   Loss 13.1858   LearningRate 0.0803   Epoch: 2   Global Step: 25790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:24,218-Speed 3126.95 samples/sec   Loss 13.2251   LearningRate 0.0803   Epoch: 2   Global Step: 25800   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:27,499-Speed 3121.76 samples/sec   Loss 13.2405   LearningRate 0.0803   Epoch: 2   Global Step: 25810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:30,827-Speed 3078.12 samples/sec   Loss 13.3319   LearningRate 0.0803   Epoch: 2   Global Step: 25820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:34,131-Speed 3100.11 samples/sec   Loss 13.2975   LearningRate 0.0803   Epoch: 2   Global Step: 25830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:37,440-Speed 3096.22 samples/sec   Loss 13.2622   LearningRate 0.0803   Epoch: 2   Global Step: 25840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:40,828-Speed 3023.39 samples/sec   Loss 13.3473   LearningRate 0.0803   Epoch: 2   Global Step: 25850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:44,194-Speed 3042.45 samples/sec   Loss 13.2358   LearningRate 0.0803   Epoch: 2   Global Step: 25860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:47,555-Speed 3048.16 samples/sec   Loss 13.2739   LearningRate 0.0803   Epoch: 2   Global Step: 25870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:50,863-Speed 3096.67 samples/sec   Loss 13.2940   LearningRate 0.0802   Epoch: 2   Global Step: 25880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:54,292-Speed 2987.01 samples/sec   Loss 13.2774   LearningRate 0.0802   Epoch: 2   Global Step: 25890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:10:57,695-Speed 3010.79 samples/sec   Loss 13.2098   LearningRate 0.0802   Epoch: 2   Global Step: 25900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:11:01,013-Speed 3086.66 samples/sec   Loss 13.2957   LearningRate 0.0802   Epoch: 2   Global Step: 25910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:11:04,352-Speed 3067.98 samples/sec   Loss 13.2373   LearningRate 0.0802   Epoch: 2   Global Step: 25920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:11:07,681-Speed 3076.98 samples/sec   Loss 13.2499   LearningRate 0.0802   Epoch: 2   Global Step: 25930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:11:10,954-Speed 3129.55 samples/sec   Loss 13.2189   LearningRate 0.0802   Epoch: 2   Global Step: 25940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:11:14,246-Speed 3111.40 samples/sec   Loss 13.2560   LearningRate 0.0802   Epoch: 2   Global Step: 25950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:11:17,518-Speed 3130.70 samples/sec   Loss 13.3923   LearningRate 0.0802   Epoch: 2   Global Step: 25960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:11:20,797-Speed 3123.48 samples/sec   Loss 13.2917   LearningRate 0.0802   Epoch: 2   Global Step: 25970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:11:24,129-Speed 3074.20 samples/sec   Loss 13.3624   LearningRate 0.0802   Epoch: 2   Global Step: 25980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:11:27,451-Speed 3083.42 samples/sec   Loss 13.3054   LearningRate 0.0802   Epoch: 2   Global Step: 25990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:11:30,813-Speed 3047.06 samples/sec   Loss 13.4357   LearningRate 0.0802   Epoch: 2   Global Step: 26000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:11:34,168-Speed 3052.79 samples/sec   Loss 13.5497   LearningRate 0.0802   Epoch: 2   Global Step: 26010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:11:37,463-Speed 3109.26 samples/sec   Loss 13.3280   LearningRate 0.0801   Epoch: 2   Global Step: 26020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:11:40,830-Speed 3042.36 samples/sec   Loss 13.4148   LearningRate 0.0801   Epoch: 2   Global Step: 26030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:11:44,162-Speed 3073.20 samples/sec   Loss 13.4633   LearningRate 0.0801   Epoch: 2   Global Step: 26040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:11:47,475-Speed 3092.37 samples/sec   Loss 13.3002   LearningRate 0.0801   Epoch: 2   Global Step: 26050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:11:50,827-Speed 3055.98 samples/sec   Loss 13.3578   LearningRate 0.0801   Epoch: 2   Global Step: 26060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:11:54,188-Speed 3046.68 samples/sec   Loss 13.3842   LearningRate 0.0801   Epoch: 2   Global Step: 26070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:11:57,564-Speed 3034.52 samples/sec   Loss 13.2572   LearningRate 0.0801   Epoch: 2   Global Step: 26080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:00,892-Speed 3077.55 samples/sec   Loss 13.4383   LearningRate 0.0801   Epoch: 2   Global Step: 26090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:04,255-Speed 3045.48 samples/sec   Loss 13.3738   LearningRate 0.0801   Epoch: 2   Global Step: 26100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:07,600-Speed 3063.29 samples/sec   Loss 13.3507   LearningRate 0.0801   Epoch: 2   Global Step: 26110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:10,953-Speed 3054.76 samples/sec   Loss 13.4124   LearningRate 0.0801   Epoch: 2   Global Step: 26120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:14,244-Speed 3112.14 samples/sec   Loss 13.1991   LearningRate 0.0801   Epoch: 2   Global Step: 26130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:17,540-Speed 3107.03 samples/sec   Loss 13.4492   LearningRate 0.0801   Epoch: 2   Global Step: 26140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:20,834-Speed 3110.31 samples/sec   Loss 13.2803   LearningRate 0.0801   Epoch: 2   Global Step: 26150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:24,189-Speed 3052.39 samples/sec   Loss 13.3670   LearningRate 0.0800   Epoch: 2   Global Step: 26160   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 04:12:27,495-Speed 3098.42 samples/sec   Loss 13.4031   LearningRate 0.0800   Epoch: 2   Global Step: 26170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:30,800-Speed 3100.29 samples/sec   Loss 13.3939   LearningRate 0.0800   Epoch: 2   Global Step: 26180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:34,129-Speed 3075.80 samples/sec   Loss 13.4229   LearningRate 0.0800   Epoch: 2   Global Step: 26190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:37,425-Speed 3107.89 samples/sec   Loss 13.2912   LearningRate 0.0800   Epoch: 2   Global Step: 26200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:40,729-Speed 3100.94 samples/sec   Loss 13.5243   LearningRate 0.0800   Epoch: 2   Global Step: 26210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:44,055-Speed 3078.84 samples/sec   Loss 13.4336   LearningRate 0.0800   Epoch: 2   Global Step: 26220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:47,373-Speed 3087.62 samples/sec   Loss 13.5761   LearningRate 0.0800   Epoch: 2   Global Step: 26230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:50,691-Speed 3087.03 samples/sec   Loss 13.4652   LearningRate 0.0800   Epoch: 2   Global Step: 26240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:54,020-Speed 3077.11 samples/sec   Loss 13.5149   LearningRate 0.0800   Epoch: 2   Global Step: 26250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:12:57,406-Speed 3024.96 samples/sec   Loss 13.4165   LearningRate 0.0800   Epoch: 2   Global Step: 26260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:13:00,694-Speed 3115.33 samples/sec   Loss 13.5074   LearningRate 0.0800   Epoch: 2   Global Step: 26270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:13:04,028-Speed 3071.82 samples/sec   Loss 13.5153   LearningRate 0.0800   Epoch: 2   Global Step: 26280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:13:07,358-Speed 3076.35 samples/sec   Loss 13.2832   LearningRate 0.0800   Epoch: 2   Global Step: 26290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:13:10,643-Speed 3117.44 samples/sec   Loss 13.3348   LearningRate 0.0799   Epoch: 2   Global Step: 26300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:13:13,981-Speed 3068.71 samples/sec   Loss 13.3035   LearningRate 0.0799   Epoch: 2   Global Step: 26310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:13:17,279-Speed 3105.65 samples/sec   Loss 13.4156   LearningRate 0.0799   Epoch: 2   Global Step: 26320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:13:20,676-Speed 3016.87 samples/sec   Loss 13.5227   LearningRate 0.0799   Epoch: 2   Global Step: 26330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:13:23,968-Speed 3111.59 samples/sec   Loss 13.3722   LearningRate 0.0799   Epoch: 2   Global Step: 26340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:13:27,321-Speed 3053.79 samples/sec   Loss 13.4578   LearningRate 0.0799   Epoch: 2   Global Step: 26350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:13:30,670-Speed 3058.54 samples/sec   Loss 13.6569   LearningRate 0.0799   Epoch: 2   Global Step: 26360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:13:33,991-Speed 3084.69 samples/sec   Loss 13.5843   LearningRate 0.0799   Epoch: 2   Global Step: 26370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:13:37,355-Speed 3045.19 samples/sec   Loss 13.3414   LearningRate 0.0799   Epoch: 2   Global Step: 26380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:13:40,658-Speed 3101.10 samples/sec   Loss 13.4404   LearningRate 0.0799   Epoch: 2   Global Step: 26390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:13:43,925-Speed 3135.31 samples/sec   Loss 13.5661   LearningRate 0.0799   Epoch: 2   Global Step: 26400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:13:47,197-Speed 3130.02 samples/sec   Loss 13.4212   LearningRate 0.0799   Epoch: 2   Global Step: 26410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:13:50,529-Speed 3074.40 samples/sec   Loss 13.5245   LearningRate 0.0799   Epoch: 2   Global Step: 26420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:13:53,818-Speed 3114.59 samples/sec   Loss 13.4970   LearningRate 0.0799   Epoch: 2   Global Step: 26430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:13:57,098-Speed 3122.54 samples/sec   Loss 13.5489   LearningRate 0.0798   Epoch: 2   Global Step: 26440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:00,386-Speed 3115.50 samples/sec   Loss 13.5165   LearningRate 0.0798   Epoch: 2   Global Step: 26450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:03,652-Speed 3136.74 samples/sec   Loss 13.5680   LearningRate 0.0798   Epoch: 2   Global Step: 26460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:06,942-Speed 3113.46 samples/sec   Loss 13.4717   LearningRate 0.0798   Epoch: 2   Global Step: 26470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:10,212-Speed 3131.76 samples/sec   Loss 13.5956   LearningRate 0.0798   Epoch: 2   Global Step: 26480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:13,556-Speed 3063.40 samples/sec   Loss 13.4282   LearningRate 0.0798   Epoch: 2   Global Step: 26490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:16,927-Speed 3039.12 samples/sec   Loss 13.5331   LearningRate 0.0798   Epoch: 2   Global Step: 26500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:20,286-Speed 3049.30 samples/sec   Loss 13.5100   LearningRate 0.0798   Epoch: 2   Global Step: 26510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:23,687-Speed 3011.61 samples/sec   Loss 13.4870   LearningRate 0.0798   Epoch: 2   Global Step: 26520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:26,968-Speed 3122.06 samples/sec   Loss 13.6275   LearningRate 0.0798   Epoch: 2   Global Step: 26530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:30,298-Speed 3076.24 samples/sec   Loss 13.4800   LearningRate 0.0798   Epoch: 2   Global Step: 26540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:33,648-Speed 3057.96 samples/sec   Loss 13.5390   LearningRate 0.0798   Epoch: 2   Global Step: 26550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:37,031-Speed 3027.44 samples/sec   Loss 13.4445   LearningRate 0.0798   Epoch: 2   Global Step: 26560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:40,350-Speed 3086.36 samples/sec   Loss 13.5555   LearningRate 0.0798   Epoch: 2   Global Step: 26570   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 04:14:43,639-Speed 3114.83 samples/sec   Loss 13.6011   LearningRate 0.0797   Epoch: 2   Global Step: 26580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:46,933-Speed 3109.83 samples/sec   Loss 13.5171   LearningRate 0.0797   Epoch: 2   Global Step: 26590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:50,268-Speed 3070.71 samples/sec   Loss 13.7166   LearningRate 0.0797   Epoch: 2   Global Step: 26600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:53,554-Speed 3117.24 samples/sec   Loss 13.4712   LearningRate 0.0797   Epoch: 2   Global Step: 26610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:14:56,901-Speed 3060.71 samples/sec   Loss 13.3705   LearningRate 0.0797   Epoch: 2   Global Step: 26620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:00,234-Speed 3073.45 samples/sec   Loss 13.6024   LearningRate 0.0797   Epoch: 2   Global Step: 26630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:03,581-Speed 3059.55 samples/sec   Loss 13.3976   LearningRate 0.0797   Epoch: 2   Global Step: 26640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:06,932-Speed 3057.07 samples/sec   Loss 13.5172   LearningRate 0.0797   Epoch: 2   Global Step: 26650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:10,258-Speed 3079.40 samples/sec   Loss 13.3320   LearningRate 0.0797   Epoch: 2   Global Step: 26660   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:13,605-Speed 3061.15 samples/sec   Loss 13.5578   LearningRate 0.0797   Epoch: 2   Global Step: 26670   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:16,923-Speed 3087.17 samples/sec   Loss 13.7853   LearningRate 0.0797   Epoch: 2   Global Step: 26680   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:20,279-Speed 3051.83 samples/sec   Loss 13.7889   LearningRate 0.0797   Epoch: 2   Global Step: 26690   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:23,658-Speed 3030.74 samples/sec   Loss 13.5547   LearningRate 0.0797   Epoch: 2   Global Step: 26700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:27,038-Speed 3030.46 samples/sec   Loss 13.6240   LearningRate 0.0797   Epoch: 2   Global Step: 26710   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:30,335-Speed 3107.15 samples/sec   Loss 13.4607   LearningRate 0.0796   Epoch: 2   Global Step: 26720   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:33,658-Speed 3082.24 samples/sec   Loss 13.6555   LearningRate 0.0796   Epoch: 2   Global Step: 26730   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:36,984-Speed 3080.45 samples/sec   Loss 13.5995   LearningRate 0.0796   Epoch: 2   Global Step: 26740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:40,312-Speed 3077.60 samples/sec   Loss 13.4418   LearningRate 0.0796   Epoch: 2   Global Step: 26750   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:43,681-Speed 3040.51 samples/sec   Loss 13.6021   LearningRate 0.0796   Epoch: 2   Global Step: 26760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:47,051-Speed 3039.64 samples/sec   Loss 13.5016   LearningRate 0.0796   Epoch: 2   Global Step: 26770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:50,425-Speed 3036.18 samples/sec   Loss 13.4610   LearningRate 0.0796   Epoch: 2   Global Step: 26780   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:53,727-Speed 3101.96 samples/sec   Loss 13.5754   LearningRate 0.0796   Epoch: 2   Global Step: 26790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:15:57,030-Speed 3100.98 samples/sec   Loss 13.5721   LearningRate 0.0796   Epoch: 2   Global Step: 26800   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:16:00,296-Speed 3136.37 samples/sec   Loss 13.5738   LearningRate 0.0796   Epoch: 2   Global Step: 26810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:16:03,636-Speed 3066.71 samples/sec   Loss 13.6350   LearningRate 0.0796   Epoch: 2   Global Step: 26820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:16:06,980-Speed 3063.07 samples/sec   Loss 13.7326   LearningRate 0.0796   Epoch: 2   Global Step: 26830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:16:10,317-Speed 3069.56 samples/sec   Loss 13.3724   LearningRate 0.0796   Epoch: 2   Global Step: 26840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:16:13,665-Speed 3060.03 samples/sec   Loss 13.5761   LearningRate 0.0796   Epoch: 2   Global Step: 26850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:16:17,006-Speed 3066.26 samples/sec   Loss 13.4776   LearningRate 0.0795   Epoch: 2   Global Step: 26860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:16:20,370-Speed 3044.24 samples/sec   Loss 13.5383   LearningRate 0.0795   Epoch: 2   Global Step: 26870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:16:23,736-Speed 3043.03 samples/sec   Loss 13.5240   LearningRate 0.0795   Epoch: 2   Global Step: 26880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:16:27,028-Speed 3112.15 samples/sec   Loss 13.7066   LearningRate 0.0795   Epoch: 2   Global Step: 26890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:16:30,410-Speed 3028.30 samples/sec   Loss 13.6809   LearningRate 0.0795   Epoch: 2   Global Step: 26900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:16:33,774-Speed 3045.07 samples/sec   Loss 13.6003   LearningRate 0.0795   Epoch: 2   Global Step: 26910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:16:37,112-Speed 3067.87 samples/sec   Loss 13.6186   LearningRate 0.0795   Epoch: 2   Global Step: 26920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:16:40,447-Speed 3071.77 samples/sec   Loss 13.5171   LearningRate 0.0795   Epoch: 2   Global Step: 26930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:16:43,750-Speed 3101.63 samples/sec   Loss 13.6327   LearningRate 0.0795   Epoch: 2   Global Step: 26940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:16:47,022-Speed 3130.81 samples/sec   Loss 13.5886   LearningRate 0.0795   Epoch: 2   Global Step: 26950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:16:50,404-Speed 3028.94 samples/sec   Loss 13.5488   LearningRate 0.0795   Epoch: 2   Global Step: 26960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:16:53,725-Speed 3084.37 samples/sec   Loss 13.7017   LearningRate 0.0795   Epoch: 2   Global Step: 26970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:16:57,184-Speed 2960.58 samples/sec   Loss 13.6403   LearningRate 0.0795   Epoch: 2   Global Step: 26980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:00,585-Speed 3012.49 samples/sec   Loss 13.6375   LearningRate 0.0795   Epoch: 2   Global Step: 26990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:03,905-Speed 3084.67 samples/sec   Loss 13.5953   LearningRate 0.0794   Epoch: 2   Global Step: 27000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:07,247-Speed 3065.46 samples/sec   Loss 13.6985   LearningRate 0.0794   Epoch: 2   Global Step: 27010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:10,571-Speed 3081.76 samples/sec   Loss 13.6207   LearningRate 0.0794   Epoch: 2   Global Step: 27020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:13,850-Speed 3123.43 samples/sec   Loss 13.6665   LearningRate 0.0794   Epoch: 2   Global Step: 27030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:17,212-Speed 3047.32 samples/sec   Loss 13.7795   LearningRate 0.0794   Epoch: 2   Global Step: 27040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:20,564-Speed 3055.04 samples/sec   Loss 13.6211   LearningRate 0.0794   Epoch: 2   Global Step: 27050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:23,853-Speed 3114.63 samples/sec   Loss 13.6302   LearningRate 0.0794   Epoch: 2   Global Step: 27060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:27,146-Speed 3110.33 samples/sec   Loss 13.7523   LearningRate 0.0794   Epoch: 2   Global Step: 27070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:30,464-Speed 3087.41 samples/sec   Loss 13.4932   LearningRate 0.0794   Epoch: 2   Global Step: 27080   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 04:17:33,761-Speed 3106.93 samples/sec   Loss 13.4713   LearningRate 0.0794   Epoch: 2   Global Step: 27090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:37,063-Speed 3102.05 samples/sec   Loss 13.5397   LearningRate 0.0794   Epoch: 2   Global Step: 27100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:40,369-Speed 3098.48 samples/sec   Loss 13.6291   LearningRate 0.0794   Epoch: 2   Global Step: 27110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:43,643-Speed 3128.82 samples/sec   Loss 13.5371   LearningRate 0.0794   Epoch: 2   Global Step: 27120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:46,970-Speed 3079.19 samples/sec   Loss 13.5747   LearningRate 0.0794   Epoch: 2   Global Step: 27130   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:50,225-Speed 3146.06 samples/sec   Loss 13.5265   LearningRate 0.0793   Epoch: 2   Global Step: 27140   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:53,563-Speed 3069.17 samples/sec   Loss 13.6397   LearningRate 0.0793   Epoch: 2   Global Step: 27150   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:17:56,856-Speed 3110.28 samples/sec   Loss 13.6325   LearningRate 0.0793   Epoch: 2   Global Step: 27160   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:00,204-Speed 3059.72 samples/sec   Loss 13.6784   LearningRate 0.0793   Epoch: 2   Global Step: 27170   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:03,508-Speed 3099.54 samples/sec   Loss 13.7335   LearningRate 0.0793   Epoch: 2   Global Step: 27180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:06,855-Speed 3060.56 samples/sec   Loss 13.6488   LearningRate 0.0793   Epoch: 2   Global Step: 27190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:10,240-Speed 3027.06 samples/sec   Loss 13.4915   LearningRate 0.0793   Epoch: 2   Global Step: 27200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:13,545-Speed 3099.33 samples/sec   Loss 13.6367   LearningRate 0.0793   Epoch: 2   Global Step: 27210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:16,853-Speed 3096.95 samples/sec   Loss 13.5852   LearningRate 0.0793   Epoch: 2   Global Step: 27220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:20,188-Speed 3071.64 samples/sec   Loss 13.8103   LearningRate 0.0793   Epoch: 2   Global Step: 27230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:23,503-Speed 3089.69 samples/sec   Loss 13.7371   LearningRate 0.0793   Epoch: 2   Global Step: 27240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:26,788-Speed 3118.65 samples/sec   Loss 13.6256   LearningRate 0.0793   Epoch: 2   Global Step: 27250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:30,068-Speed 3123.24 samples/sec   Loss 13.5175   LearningRate 0.0793   Epoch: 2   Global Step: 27260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:33,383-Speed 3089.50 samples/sec   Loss 13.6015   LearningRate 0.0793   Epoch: 2   Global Step: 27270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:36,699-Speed 3089.05 samples/sec   Loss 13.6799   LearningRate 0.0792   Epoch: 2   Global Step: 27280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:40,087-Speed 3023.68 samples/sec   Loss 13.5009   LearningRate 0.0792   Epoch: 2   Global Step: 27290   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 04:18:43,401-Speed 3090.29 samples/sec   Loss 13.7015   LearningRate 0.0792   Epoch: 2   Global Step: 27300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:46,693-Speed 3111.85 samples/sec   Loss 13.3657   LearningRate 0.0792   Epoch: 2   Global Step: 27310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:50,087-Speed 3017.33 samples/sec   Loss 13.5644   LearningRate 0.0792   Epoch: 2   Global Step: 27320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:53,490-Speed 3010.13 samples/sec   Loss 13.5812   LearningRate 0.0792   Epoch: 2   Global Step: 27330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:18:56,821-Speed 3074.97 samples/sec   Loss 13.8772   LearningRate 0.0792   Epoch: 2   Global Step: 27340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:19:00,169-Speed 3059.10 samples/sec   Loss 13.7292   LearningRate 0.0792   Epoch: 2   Global Step: 27350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:19:03,528-Speed 3050.23 samples/sec   Loss 13.8678   LearningRate 0.0792   Epoch: 2   Global Step: 27360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:19:06,824-Speed 3107.75 samples/sec   Loss 13.6787   LearningRate 0.0792   Epoch: 2   Global Step: 27370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:19:10,198-Speed 3035.82 samples/sec   Loss 13.7633   LearningRate 0.0792   Epoch: 2   Global Step: 27380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:19:13,494-Speed 3107.46 samples/sec   Loss 13.6325   LearningRate 0.0792   Epoch: 2   Global Step: 27390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:19:16,813-Speed 3087.83 samples/sec   Loss 13.7208   LearningRate 0.0792   Epoch: 2   Global Step: 27400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:19:20,143-Speed 3076.03 samples/sec   Loss 13.6611   LearningRate 0.0791   Epoch: 2   Global Step: 27410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:19:23,416-Speed 3129.89 samples/sec   Loss 13.7780   LearningRate 0.0791   Epoch: 2   Global Step: 27420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:19:26,741-Speed 3080.55 samples/sec   Loss 13.5888   LearningRate 0.0791   Epoch: 2   Global Step: 27430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:19:30,014-Speed 3129.70 samples/sec   Loss 13.7857   LearningRate 0.0791   Epoch: 2   Global Step: 27440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:19:33,338-Speed 3081.29 samples/sec   Loss 13.7826   LearningRate 0.0791   Epoch: 2   Global Step: 27450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:19:36,684-Speed 3061.16 samples/sec   Loss 13.5985   LearningRate 0.0791   Epoch: 2   Global Step: 27460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:19:40,023-Speed 3067.98 samples/sec   Loss 13.6958   LearningRate 0.0791   Epoch: 2   Global Step: 27470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:19:43,334-Speed 3093.59 samples/sec   Loss 13.6256   LearningRate 0.0791   Epoch: 2   Global Step: 27480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:19:46,663-Speed 3077.05 samples/sec   Loss 13.7637   LearningRate 0.0791   Epoch: 2   Global Step: 27490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:19:49,933-Speed 3132.87 samples/sec   Loss 13.6159   LearningRate 0.0791   Epoch: 2   Global Step: 27500   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:19:53,259-Speed 3078.93 samples/sec   Loss 13.5868   LearningRate 0.0791   Epoch: 2   Global Step: 27510   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:19:56,588-Speed 3077.04 samples/sec   Loss 13.6989   LearningRate 0.0791   Epoch: 2   Global Step: 27520   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:19:59,908-Speed 3085.49 samples/sec   Loss 13.7079   LearningRate 0.0791   Epoch: 2   Global Step: 27530   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:20:03,185-Speed 3125.71 samples/sec   Loss 13.7187   LearningRate 0.0791   Epoch: 2   Global Step: 27540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:20:06,572-Speed 3024.76 samples/sec   Loss 13.7556   LearningRate 0.0790   Epoch: 2   Global Step: 27550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:20:09,864-Speed 3110.98 samples/sec   Loss 13.7629   LearningRate 0.0790   Epoch: 2   Global Step: 27560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:20:13,194-Speed 3076.35 samples/sec   Loss 13.7720   LearningRate 0.0790   Epoch: 2   Global Step: 27570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:20:16,525-Speed 3074.71 samples/sec   Loss 13.5793   LearningRate 0.0790   Epoch: 2   Global Step: 27580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:20:19,847-Speed 3083.28 samples/sec   Loss 13.6874   LearningRate 0.0790   Epoch: 2   Global Step: 27590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:20:23,111-Speed 3138.19 samples/sec   Loss 13.6645   LearningRate 0.0790   Epoch: 2   Global Step: 27600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:20:26,435-Speed 3081.68 samples/sec   Loss 13.7392   LearningRate 0.0790   Epoch: 2   Global Step: 27610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:20:29,796-Speed 3047.56 samples/sec   Loss 13.6405   LearningRate 0.0790   Epoch: 2   Global Step: 27620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:20:33,095-Speed 3104.49 samples/sec   Loss 13.6857   LearningRate 0.0790   Epoch: 2   Global Step: 27630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:20:36,350-Speed 3147.11 samples/sec   Loss 13.7881   LearningRate 0.0790   Epoch: 2   Global Step: 27640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:20:39,672-Speed 3083.17 samples/sec   Loss 13.5655   LearningRate 0.0790   Epoch: 2   Global Step: 27650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:20:42,995-Speed 3083.33 samples/sec   Loss 13.7114   LearningRate 0.0790   Epoch: 2   Global Step: 27660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:20:46,402-Speed 3005.91 samples/sec   Loss 13.6711   LearningRate 0.0790   Epoch: 2   Global Step: 27670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:20:49,733-Speed 3075.22 samples/sec   Loss 13.6167   LearningRate 0.0790   Epoch: 2   Global Step: 27680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:20:53,025-Speed 3112.25 samples/sec   Loss 13.7310   LearningRate 0.0789   Epoch: 2   Global Step: 27690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:20:56,341-Speed 3088.32 samples/sec   Loss 13.6701   LearningRate 0.0789   Epoch: 2   Global Step: 27700   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:20:59,663-Speed 3084.07 samples/sec   Loss 13.6730   LearningRate 0.0789   Epoch: 2   Global Step: 27710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:02,992-Speed 3076.93 samples/sec   Loss 13.6371   LearningRate 0.0789   Epoch: 2   Global Step: 27720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:06,362-Speed 3038.49 samples/sec   Loss 13.6714   LearningRate 0.0789   Epoch: 2   Global Step: 27730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:09,752-Speed 3021.83 samples/sec   Loss 13.5844   LearningRate 0.0789   Epoch: 2   Global Step: 27740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:13,060-Speed 3096.39 samples/sec   Loss 13.6538   LearningRate 0.0789   Epoch: 2   Global Step: 27750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:16,318-Speed 3144.14 samples/sec   Loss 13.4738   LearningRate 0.0789   Epoch: 2   Global Step: 27760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:19,667-Speed 3058.85 samples/sec   Loss 13.5471   LearningRate 0.0789   Epoch: 2   Global Step: 27770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:23,037-Speed 3039.35 samples/sec   Loss 13.7857   LearningRate 0.0789   Epoch: 2   Global Step: 27780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:26,303-Speed 3136.25 samples/sec   Loss 13.5663   LearningRate 0.0789   Epoch: 2   Global Step: 27790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:29,649-Speed 3060.87 samples/sec   Loss 13.5161   LearningRate 0.0789   Epoch: 2   Global Step: 27800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:33,005-Speed 3051.64 samples/sec   Loss 13.8040   LearningRate 0.0789   Epoch: 2   Global Step: 27810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:21:36,375-Speed 3039.65 samples/sec   Loss 13.5301   LearningRate 0.0789   Epoch: 2   Global Step: 27820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:21:39,691-Speed 3088.71 samples/sec   Loss 13.5731   LearningRate 0.0788   Epoch: 2   Global Step: 27830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:43,022-Speed 3075.22 samples/sec   Loss 13.6296   LearningRate 0.0788   Epoch: 2   Global Step: 27840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:46,361-Speed 3067.23 samples/sec   Loss 13.5407   LearningRate 0.0788   Epoch: 2   Global Step: 27850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:49,711-Speed 3057.78 samples/sec   Loss 13.7258   LearningRate 0.0788   Epoch: 2   Global Step: 27860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:53,001-Speed 3113.04 samples/sec   Loss 13.7074   LearningRate 0.0788   Epoch: 2   Global Step: 27870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:56,315-Speed 3091.42 samples/sec   Loss 13.7622   LearningRate 0.0788   Epoch: 2   Global Step: 27880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:21:59,670-Speed 3053.17 samples/sec   Loss 13.5730   LearningRate 0.0788   Epoch: 2   Global Step: 27890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:22:02,937-Speed 3135.43 samples/sec   Loss 13.6018   LearningRate 0.0788   Epoch: 2   Global Step: 27900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:22:06,207-Speed 3131.97 samples/sec   Loss 13.6161   LearningRate 0.0788   Epoch: 2   Global Step: 27910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:22:09,500-Speed 3110.65 samples/sec   Loss 13.6994   LearningRate 0.0788   Epoch: 2   Global Step: 27920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:22:12,864-Speed 3044.66 samples/sec   Loss 13.7788   LearningRate 0.0788   Epoch: 2   Global Step: 27930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:22:16,135-Speed 3131.49 samples/sec   Loss 13.7415   LearningRate 0.0788   Epoch: 2   Global Step: 27940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:22:19,505-Speed 3040.19 samples/sec   Loss 13.5957   LearningRate 0.0788   Epoch: 2   Global Step: 27950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:22:22,824-Speed 3086.00 samples/sec   Loss 13.6832   LearningRate 0.0788   Epoch: 2   Global Step: 27960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:22:26,164-Speed 3067.36 samples/sec   Loss 13.5804   LearningRate 0.0787   Epoch: 2   Global Step: 27970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:22:29,498-Speed 3072.17 samples/sec   Loss 13.7327   LearningRate 0.0787   Epoch: 2   Global Step: 27980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:22:32,802-Speed 3099.81 samples/sec   Loss 13.6619   LearningRate 0.0787   Epoch: 2   Global Step: 27990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:22:36,091-Speed 3114.15 samples/sec   Loss 13.7651   LearningRate 0.0787   Epoch: 2   Global Step: 28000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:22:39,372-Speed 3122.60 samples/sec   Loss 13.6732   LearningRate 0.0787   Epoch: 2   Global Step: 28010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:22:42,681-Speed 3095.93 samples/sec   Loss 13.5761   LearningRate 0.0787   Epoch: 2   Global Step: 28020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:22:45,981-Speed 3103.44 samples/sec   Loss 13.7041   LearningRate 0.0787   Epoch: 2   Global Step: 28030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:22:49,346-Speed 3044.09 samples/sec   Loss 13.6100   LearningRate 0.0787   Epoch: 2   Global Step: 28040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:22:52,715-Speed 3040.23 samples/sec   Loss 13.6622   LearningRate 0.0787   Epoch: 2   Global Step: 28050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:22:56,121-Speed 3008.05 samples/sec   Loss 13.7907   LearningRate 0.0787   Epoch: 2   Global Step: 28060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:22:59,468-Speed 3060.16 samples/sec   Loss 13.6090   LearningRate 0.0787   Epoch: 2   Global Step: 28070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:23:02,799-Speed 3075.24 samples/sec   Loss 13.5500   LearningRate 0.0787   Epoch: 2   Global Step: 28080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:23:06,138-Speed 3067.81 samples/sec   Loss 13.6471   LearningRate 0.0787   Epoch: 2   Global Step: 28090   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:23:09,448-Speed 3094.34 samples/sec   Loss 13.5439   LearningRate 0.0787   Epoch: 2   Global Step: 28100   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:23:12,742-Speed 3109.78 samples/sec   Loss 13.6082   LearningRate 0.0786   Epoch: 2   Global Step: 28110   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:23:16,088-Speed 3061.01 samples/sec   Loss 13.6722   LearningRate 0.0786   Epoch: 2   Global Step: 28120   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:23:19,338-Speed 3151.96 samples/sec   Loss 13.7552   LearningRate 0.0786   Epoch: 2   Global Step: 28130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:23:22,657-Speed 3086.06 samples/sec   Loss 13.5251   LearningRate 0.0786   Epoch: 2   Global Step: 28140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:23:26,035-Speed 3032.78 samples/sec   Loss 13.7212   LearningRate 0.0786   Epoch: 2   Global Step: 28150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:23:29,307-Speed 3129.97 samples/sec   Loss 13.5585   LearningRate 0.0786   Epoch: 2   Global Step: 28160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:23:32,689-Speed 3028.38 samples/sec   Loss 13.6549   LearningRate 0.0786   Epoch: 2   Global Step: 28170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:23:36,055-Speed 3043.36 samples/sec   Loss 13.6119   LearningRate 0.0786   Epoch: 2   Global Step: 28180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:23:39,401-Speed 3061.31 samples/sec   Loss 13.5155   LearningRate 0.0786   Epoch: 2   Global Step: 28190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:23:42,713-Speed 3093.08 samples/sec   Loss 13.7530   LearningRate 0.0786   Epoch: 2   Global Step: 28200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:23:46,072-Speed 3049.34 samples/sec   Loss 13.6362   LearningRate 0.0786   Epoch: 2   Global Step: 28210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:23:49,328-Speed 3145.40 samples/sec   Loss 13.7777   LearningRate 0.0786   Epoch: 2   Global Step: 28220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 04:23:52,585-Speed 3145.36 samples/sec   Loss 13.7361   LearningRate 0.0786   Epoch: 2   Global Step: 28230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:23:55,907-Speed 3084.08 samples/sec   Loss 13.6583   LearningRate 0.0786   Epoch: 2   Global Step: 28240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:23:59,320-Speed 3000.81 samples/sec   Loss 13.5749   LearningRate 0.0785   Epoch: 2   Global Step: 28250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:02,607-Speed 3116.73 samples/sec   Loss 13.5988   LearningRate 0.0785   Epoch: 2   Global Step: 28260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:05,881-Speed 3128.02 samples/sec   Loss 13.7135   LearningRate 0.0785   Epoch: 2   Global Step: 28270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:09,248-Speed 3043.25 samples/sec   Loss 13.6831   LearningRate 0.0785   Epoch: 2   Global Step: 28280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:12,564-Speed 3088.22 samples/sec   Loss 13.5543   LearningRate 0.0785   Epoch: 2   Global Step: 28290   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:15,896-Speed 3074.12 samples/sec   Loss 13.5843   LearningRate 0.0785   Epoch: 2   Global Step: 28300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:19,205-Speed 3096.60 samples/sec   Loss 13.5057   LearningRate 0.0785   Epoch: 2   Global Step: 28310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:22,510-Speed 3098.64 samples/sec   Loss 13.8234   LearningRate 0.0785   Epoch: 2   Global Step: 28320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:25,848-Speed 3069.32 samples/sec   Loss 13.5930   LearningRate 0.0785   Epoch: 2   Global Step: 28330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:29,143-Speed 3108.13 samples/sec   Loss 13.8097   LearningRate 0.0785   Epoch: 2   Global Step: 28340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:32,516-Speed 3036.81 samples/sec   Loss 13.6475   LearningRate 0.0785   Epoch: 2   Global Step: 28350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:35,961-Speed 2973.15 samples/sec   Loss 13.6687   LearningRate 0.0785   Epoch: 2   Global Step: 28360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:39,368-Speed 3006.75 samples/sec   Loss 13.6495   LearningRate 0.0785   Epoch: 2   Global Step: 28370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:42,671-Speed 3101.30 samples/sec   Loss 13.5850   LearningRate 0.0785   Epoch: 2   Global Step: 28380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:46,000-Speed 3076.28 samples/sec   Loss 13.5174   LearningRate 0.0784   Epoch: 2   Global Step: 28390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:49,321-Speed 3085.58 samples/sec   Loss 13.6386   LearningRate 0.0784   Epoch: 2   Global Step: 28400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:52,679-Speed 3050.27 samples/sec   Loss 13.6636   LearningRate 0.0784   Epoch: 2   Global Step: 28410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:56,098-Speed 2996.17 samples/sec   Loss 13.5115   LearningRate 0.0784   Epoch: 2   Global Step: 28420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:24:59,364-Speed 3135.94 samples/sec   Loss 13.6837   LearningRate 0.0784   Epoch: 2   Global Step: 28430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:25:02,701-Speed 3070.25 samples/sec   Loss 13.8603   LearningRate 0.0784   Epoch: 2   Global Step: 28440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:25:06,014-Speed 3091.56 samples/sec   Loss 13.5724   LearningRate 0.0784   Epoch: 2   Global Step: 28450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:25:09,367-Speed 3054.92 samples/sec   Loss 13.6485   LearningRate 0.0784   Epoch: 2   Global Step: 28460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:25:12,755-Speed 3023.68 samples/sec   Loss 13.5851   LearningRate 0.0784   Epoch: 2   Global Step: 28470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:25:16,041-Speed 3117.02 samples/sec   Loss 13.5983   LearningRate 0.0784   Epoch: 2   Global Step: 28480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 04:25:19,314-Speed 3130.34 samples/sec   Loss 13.6311   LearningRate 0.0784   Epoch: 2   Global Step: 28490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:25:22,655-Speed 3065.01 samples/sec   Loss 13.7901   LearningRate 0.0784   Epoch: 2   Global Step: 28500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:25:25,989-Speed 3073.03 samples/sec   Loss 13.6035   LearningRate 0.0784   Epoch: 2   Global Step: 28510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:25:29,359-Speed 3039.01 samples/sec   Loss 13.7311   LearningRate 0.0784   Epoch: 2   Global Step: 28520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:25:32,709-Speed 3057.61 samples/sec   Loss 13.8125   LearningRate 0.0783   Epoch: 2   Global Step: 28530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:25:36,071-Speed 3046.65 samples/sec   Loss 13.5980   LearningRate 0.0783   Epoch: 2   Global Step: 28540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:25:39,408-Speed 3070.06 samples/sec   Loss 13.5439   LearningRate 0.0783   Epoch: 2   Global Step: 28550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:25:42,703-Speed 3108.98 samples/sec   Loss 13.6165   LearningRate 0.0783   Epoch: 2   Global Step: 28560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:25:46,068-Speed 3043.66 samples/sec   Loss 13.7239   LearningRate 0.0783   Epoch: 2   Global Step: 28570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:25:49,422-Speed 3054.20 samples/sec   Loss 13.5877   LearningRate 0.0783   Epoch: 2   Global Step: 28580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:25:52,746-Speed 3081.71 samples/sec   Loss 13.5449   LearningRate 0.0783   Epoch: 2   Global Step: 28590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:25:56,100-Speed 3053.81 samples/sec   Loss 13.5843   LearningRate 0.0783   Epoch: 2   Global Step: 28600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:25:59,420-Speed 3085.42 samples/sec   Loss 13.8068   LearningRate 0.0783   Epoch: 2   Global Step: 28610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:26:02,737-Speed 3087.66 samples/sec   Loss 13.4571   LearningRate 0.0783   Epoch: 2   Global Step: 28620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:26:06,103-Speed 3043.67 samples/sec   Loss 13.6266   LearningRate 0.0783   Epoch: 2   Global Step: 28630   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:26:09,471-Speed 3041.29 samples/sec   Loss 13.5626   LearningRate 0.0783   Epoch: 2   Global Step: 28640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:26:12,819-Speed 3059.68 samples/sec   Loss 13.6635   LearningRate 0.0783   Epoch: 2   Global Step: 28650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:26:16,137-Speed 3087.23 samples/sec   Loss 13.6692   LearningRate 0.0783   Epoch: 2   Global Step: 28660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:26:19,519-Speed 3028.86 samples/sec   Loss 13.6678   LearningRate 0.0783   Epoch: 2   Global Step: 28670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:26:22,864-Speed 3061.94 samples/sec   Loss 13.6897   LearningRate 0.0782   Epoch: 2   Global Step: 28680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:26:26,230-Speed 3043.58 samples/sec   Loss 13.4713   LearningRate 0.0782   Epoch: 2   Global Step: 28690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:26:29,507-Speed 3126.24 samples/sec   Loss 13.6912   LearningRate 0.0782   Epoch: 2   Global Step: 28700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:26:32,853-Speed 3061.41 samples/sec   Loss 13.5977   LearningRate 0.0782   Epoch: 2   Global Step: 28710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:26:36,189-Speed 3070.76 samples/sec   Loss 13.5805   LearningRate 0.0782   Epoch: 2   Global Step: 28720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:26:39,561-Speed 3036.88 samples/sec   Loss 13.6973   LearningRate 0.0782   Epoch: 2   Global Step: 28730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:26:42,885-Speed 3081.85 samples/sec   Loss 13.6772   LearningRate 0.0782   Epoch: 2   Global Step: 28740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:26:46,154-Speed 3132.90 samples/sec   Loss 13.5886   LearningRate 0.0782   Epoch: 2   Global Step: 28750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:26:49,572-Speed 2997.35 samples/sec   Loss 13.5097   LearningRate 0.0782   Epoch: 2   Global Step: 28760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:26:52,959-Speed 3023.93 samples/sec   Loss 13.6683   LearningRate 0.0782   Epoch: 2   Global Step: 28770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:26:56,347-Speed 3023.47 samples/sec   Loss 13.6752   LearningRate 0.0782   Epoch: 2   Global Step: 28780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:26:59,650-Speed 3101.14 samples/sec   Loss 13.6555   LearningRate 0.0782   Epoch: 2   Global Step: 28790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:02,976-Speed 3079.67 samples/sec   Loss 13.6190   LearningRate 0.0782   Epoch: 2   Global Step: 28800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:06,260-Speed 3119.35 samples/sec   Loss 13.5719   LearningRate 0.0782   Epoch: 2   Global Step: 28810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:09,580-Speed 3085.48 samples/sec   Loss 13.6199   LearningRate 0.0781   Epoch: 2   Global Step: 28820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:12,832-Speed 3149.32 samples/sec   Loss 13.6784   LearningRate 0.0781   Epoch: 2   Global Step: 28830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:16,199-Speed 3042.27 samples/sec   Loss 13.6444   LearningRate 0.0781   Epoch: 2   Global Step: 28840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:19,544-Speed 3062.23 samples/sec   Loss 13.6465   LearningRate 0.0781   Epoch: 2   Global Step: 28850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:22,884-Speed 3067.39 samples/sec   Loss 13.5387   LearningRate 0.0781   Epoch: 2   Global Step: 28860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:26,214-Speed 3076.21 samples/sec   Loss 13.6559   LearningRate 0.0781   Epoch: 2   Global Step: 28870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:29,580-Speed 3042.90 samples/sec   Loss 13.6790   LearningRate 0.0781   Epoch: 2   Global Step: 28880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:32,959-Speed 3031.00 samples/sec   Loss 13.4524   LearningRate 0.0781   Epoch: 2   Global Step: 28890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:36,249-Speed 3114.67 samples/sec   Loss 13.5575   LearningRate 0.0781   Epoch: 2   Global Step: 28900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:39,597-Speed 3059.25 samples/sec   Loss 13.4765   LearningRate 0.0781   Epoch: 2   Global Step: 28910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:42,897-Speed 3104.18 samples/sec   Loss 13.7536   LearningRate 0.0781   Epoch: 2   Global Step: 28920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:46,207-Speed 3094.42 samples/sec   Loss 13.6159   LearningRate 0.0781   Epoch: 2   Global Step: 28930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:49,513-Speed 3099.04 samples/sec   Loss 13.7722   LearningRate 0.0781   Epoch: 2   Global Step: 28940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:52,893-Speed 3030.42 samples/sec   Loss 13.4876   LearningRate 0.0781   Epoch: 2   Global Step: 28950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:56,204-Speed 3093.65 samples/sec   Loss 13.7255   LearningRate 0.0780   Epoch: 2   Global Step: 28960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:27:59,545-Speed 3065.20 samples/sec   Loss 13.5315   LearningRate 0.0780   Epoch: 2   Global Step: 28970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:28:02,880-Speed 3071.75 samples/sec   Loss 13.7524   LearningRate 0.0780   Epoch: 2   Global Step: 28980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:28:06,278-Speed 3014.48 samples/sec   Loss 13.6687   LearningRate 0.0780   Epoch: 2   Global Step: 28990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:28:09,618-Speed 3066.52 samples/sec   Loss 13.5905   LearningRate 0.0780   Epoch: 2   Global Step: 29000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:28:12,973-Speed 3053.54 samples/sec   Loss 13.5925   LearningRate 0.0780   Epoch: 2   Global Step: 29010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:28:16,376-Speed 3009.58 samples/sec   Loss 13.6221   LearningRate 0.0780   Epoch: 2   Global Step: 29020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:28:19,716-Speed 3066.90 samples/sec   Loss 13.7121   LearningRate 0.0780   Epoch: 2   Global Step: 29030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:28:23,064-Speed 3060.15 samples/sec   Loss 13.6336   LearningRate 0.0780   Epoch: 2   Global Step: 29040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:28:26,409-Speed 3061.56 samples/sec   Loss 13.6127   LearningRate 0.0780   Epoch: 2   Global Step: 29050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:28:29,723-Speed 3091.07 samples/sec   Loss 13.6533   LearningRate 0.0780   Epoch: 2   Global Step: 29060   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:28:33,054-Speed 3075.36 samples/sec   Loss 13.4210   LearningRate 0.0780   Epoch: 2   Global Step: 29070   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:28:36,321-Speed 3134.52 samples/sec   Loss 13.5733   LearningRate 0.0780   Epoch: 2   Global Step: 29080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:28:39,602-Speed 3122.74 samples/sec   Loss 13.5159   LearningRate 0.0780   Epoch: 2   Global Step: 29090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:28:42,881-Speed 3123.53 samples/sec   Loss 13.6542   LearningRate 0.0779   Epoch: 2   Global Step: 29100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:28:46,188-Speed 3097.76 samples/sec   Loss 13.5626   LearningRate 0.0779   Epoch: 2   Global Step: 29110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:28:49,522-Speed 3072.46 samples/sec   Loss 13.5604   LearningRate 0.0779   Epoch: 2   Global Step: 29120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:28:52,806-Speed 3119.04 samples/sec   Loss 13.6252   LearningRate 0.0779   Epoch: 2   Global Step: 29130   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:28:56,150-Speed 3062.55 samples/sec   Loss 13.5751   LearningRate 0.0779   Epoch: 2   Global Step: 29140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:28:59,439-Speed 3114.59 samples/sec   Loss 13.6363   LearningRate 0.0779   Epoch: 2   Global Step: 29150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:29:02,768-Speed 3077.03 samples/sec   Loss 13.4658   LearningRate 0.0779   Epoch: 2   Global Step: 29160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:29:06,087-Speed 3085.59 samples/sec   Loss 13.7012   LearningRate 0.0779   Epoch: 2   Global Step: 29170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:29:09,386-Speed 3104.82 samples/sec   Loss 13.6240   LearningRate 0.0779   Epoch: 2   Global Step: 29180   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:29:12,712-Speed 3080.26 samples/sec   Loss 13.6694   LearningRate 0.0779   Epoch: 2   Global Step: 29190   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:29:15,989-Speed 3125.70 samples/sec   Loss 13.5740   LearningRate 0.0779   Epoch: 2   Global Step: 29200   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:29:19,319-Speed 3075.81 samples/sec   Loss 13.6630   LearningRate 0.0779   Epoch: 2   Global Step: 29210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:29:22,604-Speed 3118.00 samples/sec   Loss 13.6160   LearningRate 0.0779   Epoch: 2   Global Step: 29220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:29:25,951-Speed 3059.87 samples/sec   Loss 13.6055   LearningRate 0.0779   Epoch: 2   Global Step: 29230   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:29:29,353-Speed 3011.05 samples/sec   Loss 13.5223   LearningRate 0.0778   Epoch: 2   Global Step: 29240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:29:32,696-Speed 3063.98 samples/sec   Loss 13.5838   LearningRate 0.0778   Epoch: 2   Global Step: 29250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:29:36,039-Speed 3064.07 samples/sec   Loss 13.6459   LearningRate 0.0778   Epoch: 2   Global Step: 29260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:29:39,312-Speed 3129.78 samples/sec   Loss 13.6965   LearningRate 0.0778   Epoch: 2   Global Step: 29270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:29:42,657-Speed 3062.26 samples/sec   Loss 13.5480   LearningRate 0.0778   Epoch: 2   Global Step: 29280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:29:45,939-Speed 3120.09 samples/sec   Loss 13.5668   LearningRate 0.0778   Epoch: 2   Global Step: 29290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:29:49,258-Speed 3085.89 samples/sec   Loss 13.5390   LearningRate 0.0778   Epoch: 2   Global Step: 29300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:29:52,626-Speed 3041.55 samples/sec   Loss 13.7917   LearningRate 0.0778   Epoch: 2   Global Step: 29310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:29:56,027-Speed 3011.67 samples/sec   Loss 13.5181   LearningRate 0.0778   Epoch: 2   Global Step: 29320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:29:59,334-Speed 3097.48 samples/sec   Loss 13.5705   LearningRate 0.0778   Epoch: 2   Global Step: 29330   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:30:02,683-Speed 3059.20 samples/sec   Loss 13.5424   LearningRate 0.0778   Epoch: 2   Global Step: 29340   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:30:06,042-Speed 3049.51 samples/sec   Loss 13.5363   LearningRate 0.0778   Epoch: 2   Global Step: 29350   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:30:09,392-Speed 3057.29 samples/sec   Loss 13.4735   LearningRate 0.0778   Epoch: 2   Global Step: 29360   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:30:12,677-Speed 3118.78 samples/sec   Loss 13.6214   LearningRate 0.0778   Epoch: 2   Global Step: 29370   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:30:16,032-Speed 3053.25 samples/sec   Loss 13.5268   LearningRate 0.0777   Epoch: 2   Global Step: 29380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:30:19,344-Speed 3092.08 samples/sec   Loss 13.6066   LearningRate 0.0777   Epoch: 2   Global Step: 29390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:30:22,668-Speed 3081.95 samples/sec   Loss 13.5516   LearningRate 0.0777   Epoch: 2   Global Step: 29400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:30:25,990-Speed 3083.47 samples/sec   Loss 13.6017   LearningRate 0.0777   Epoch: 2   Global Step: 29410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:30:29,325-Speed 3071.27 samples/sec   Loss 13.5056   LearningRate 0.0777   Epoch: 2   Global Step: 29420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:30:32,653-Speed 3078.21 samples/sec   Loss 13.5013   LearningRate 0.0777   Epoch: 2   Global Step: 29430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:30:35,940-Speed 3115.93 samples/sec   Loss 13.6319   LearningRate 0.0777   Epoch: 2   Global Step: 29440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:30:39,242-Speed 3101.36 samples/sec   Loss 13.4952   LearningRate 0.0777   Epoch: 2   Global Step: 29450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:30:42,571-Speed 3077.40 samples/sec   Loss 13.6023   LearningRate 0.0777   Epoch: 2   Global Step: 29460   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:30:45,887-Speed 3088.80 samples/sec   Loss 13.6348   LearningRate 0.0777   Epoch: 2   Global Step: 29470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:30:49,180-Speed 3110.43 samples/sec   Loss 13.4611   LearningRate 0.0777   Epoch: 2   Global Step: 29480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:30:52,480-Speed 3104.10 samples/sec   Loss 13.4249   LearningRate 0.0777   Epoch: 2   Global Step: 29490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:30:55,815-Speed 3071.60 samples/sec   Loss 13.5336   LearningRate 0.0777   Epoch: 2   Global Step: 29500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:30:59,144-Speed 3076.90 samples/sec   Loss 13.5344   LearningRate 0.0777   Epoch: 2   Global Step: 29510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:31:02,479-Speed 3071.65 samples/sec   Loss 13.6293   LearningRate 0.0776   Epoch: 2   Global Step: 29520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:31:05,792-Speed 3091.53 samples/sec   Loss 13.4015   LearningRate 0.0776   Epoch: 2   Global Step: 29530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:31:09,076-Speed 3118.95 samples/sec   Loss 13.6521   LearningRate 0.0776   Epoch: 2   Global Step: 29540   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:31:12,384-Speed 3096.70 samples/sec   Loss 13.4798   LearningRate 0.0776   Epoch: 2   Global Step: 29550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:31:15,649-Speed 3137.46 samples/sec   Loss 13.6713   LearningRate 0.0776   Epoch: 2   Global Step: 29560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:31:18,977-Speed 3076.88 samples/sec   Loss 13.5579   LearningRate 0.0776   Epoch: 2   Global Step: 29570   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:31:22,307-Speed 3076.80 samples/sec   Loss 13.6432   LearningRate 0.0776   Epoch: 2   Global Step: 29580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:31:25,637-Speed 3075.93 samples/sec   Loss 13.4846   LearningRate 0.0776   Epoch: 2   Global Step: 29590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:31:28,891-Speed 3148.04 samples/sec   Loss 13.5823   LearningRate 0.0776   Epoch: 2   Global Step: 29600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:31:32,234-Speed 3063.54 samples/sec   Loss 13.5909   LearningRate 0.0776   Epoch: 2   Global Step: 29610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:31:35,572-Speed 3068.84 samples/sec   Loss 13.5511   LearningRate 0.0776   Epoch: 2   Global Step: 29620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:31:38,938-Speed 3042.58 samples/sec   Loss 13.5125   LearningRate 0.0776   Epoch: 2   Global Step: 29630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:31:42,244-Speed 3099.10 samples/sec   Loss 13.4525   LearningRate 0.0776   Epoch: 2   Global Step: 29640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:31:45,584-Speed 3066.42 samples/sec   Loss 13.5735   LearningRate 0.0776   Epoch: 2   Global Step: 29650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:31:48,913-Speed 3077.19 samples/sec   Loss 13.7072   LearningRate 0.0775   Epoch: 2   Global Step: 29660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:31:52,211-Speed 3106.53 samples/sec   Loss 13.6688   LearningRate 0.0775   Epoch: 2   Global Step: 29670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:31:55,513-Speed 3101.92 samples/sec   Loss 13.5709   LearningRate 0.0775   Epoch: 2   Global Step: 29680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:31:58,803-Speed 3112.62 samples/sec   Loss 13.6285   LearningRate 0.0775   Epoch: 2   Global Step: 29690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:32:02,169-Speed 3043.03 samples/sec   Loss 13.5757   LearningRate 0.0775   Epoch: 2   Global Step: 29700   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 04:32:05,464-Speed 3109.33 samples/sec   Loss 13.4587   LearningRate 0.0775   Epoch: 2   Global Step: 29710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:32:08,749-Speed 3117.86 samples/sec   Loss 13.6255   LearningRate 0.0775   Epoch: 2   Global Step: 29720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:32:12,090-Speed 3066.13 samples/sec   Loss 13.5453   LearningRate 0.0775   Epoch: 2   Global Step: 29730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:32:15,431-Speed 3065.50 samples/sec   Loss 13.5904   LearningRate 0.0775   Epoch: 2   Global Step: 29740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:32:18,768-Speed 3069.84 samples/sec   Loss 13.5913   LearningRate 0.0775   Epoch: 2   Global Step: 29750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:32:22,044-Speed 3126.71 samples/sec   Loss 13.3791   LearningRate 0.0775   Epoch: 2   Global Step: 29760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:32:25,401-Speed 3051.52 samples/sec   Loss 13.5971   LearningRate 0.0775   Epoch: 2   Global Step: 29770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:32:28,705-Speed 3100.25 samples/sec   Loss 13.5473   LearningRate 0.0775   Epoch: 2   Global Step: 29780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:32:32,036-Speed 3074.55 samples/sec   Loss 13.5548   LearningRate 0.0775   Epoch: 2   Global Step: 29790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:32:35,360-Speed 3082.25 samples/sec   Loss 13.4715   LearningRate 0.0774   Epoch: 2   Global Step: 29800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:32:38,675-Speed 3090.02 samples/sec   Loss 13.5135   LearningRate 0.0774   Epoch: 2   Global Step: 29810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:32:41,967-Speed 3111.32 samples/sec   Loss 13.6865   LearningRate 0.0774   Epoch: 2   Global Step: 29820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:32:45,239-Speed 3131.05 samples/sec   Loss 13.4668   LearningRate 0.0774   Epoch: 2   Global Step: 29830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:32:48,525-Speed 3116.86 samples/sec   Loss 13.5344   LearningRate 0.0774   Epoch: 2   Global Step: 29840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:32:51,855-Speed 3076.22 samples/sec   Loss 13.5162   LearningRate 0.0774   Epoch: 2   Global Step: 29850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:32:55,151-Speed 3107.71 samples/sec   Loss 13.4288   LearningRate 0.0774   Epoch: 2   Global Step: 29860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:32:58,442-Speed 3112.50 samples/sec   Loss 13.5423   LearningRate 0.0774   Epoch: 2   Global Step: 29870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:33:01,778-Speed 3070.54 samples/sec   Loss 13.5081   LearningRate 0.0774   Epoch: 2   Global Step: 29880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:33:05,114-Speed 3070.63 samples/sec   Loss 13.6163   LearningRate 0.0774   Epoch: 2   Global Step: 29890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:33:08,472-Speed 3049.58 samples/sec   Loss 13.4005   LearningRate 0.0774   Epoch: 2   Global Step: 29900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:33:11,797-Speed 3080.59 samples/sec   Loss 13.6140   LearningRate 0.0774   Epoch: 2   Global Step: 29910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:33:15,145-Speed 3060.31 samples/sec   Loss 13.4652   LearningRate 0.0774   Epoch: 2   Global Step: 29920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:33:18,435-Speed 3112.64 samples/sec   Loss 13.4755   LearningRate 0.0774   Epoch: 2   Global Step: 29930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:33:21,769-Speed 3073.04 samples/sec   Loss 13.4874   LearningRate 0.0773   Epoch: 2   Global Step: 29940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:33:25,133-Speed 3045.23 samples/sec   Loss 13.5419   LearningRate 0.0773   Epoch: 2   Global Step: 29950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:33:28,431-Speed 3105.80 samples/sec   Loss 13.2860   LearningRate 0.0773   Epoch: 2   Global Step: 29960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:33:31,802-Speed 3038.11 samples/sec   Loss 13.4868   LearningRate 0.0773   Epoch: 2   Global Step: 29970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:33:35,107-Speed 3098.80 samples/sec   Loss 13.4528   LearningRate 0.0773   Epoch: 2   Global Step: 29980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:33:38,407-Speed 3104.59 samples/sec   Loss 13.5212   LearningRate 0.0773   Epoch: 2   Global Step: 29990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:33:41,708-Speed 3102.68 samples/sec   Loss 13.4855   LearningRate 0.0773   Epoch: 2   Global Step: 30000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:33:45,063-Speed 3052.74 samples/sec   Loss 13.6108   LearningRate 0.0773   Epoch: 2   Global Step: 30010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:33:48,408-Speed 3062.93 samples/sec   Loss 13.5350   LearningRate 0.0773   Epoch: 2   Global Step: 30020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:33:51,683-Speed 3126.94 samples/sec   Loss 13.4450   LearningRate 0.0773   Epoch: 2   Global Step: 30030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:33:55,012-Speed 3077.60 samples/sec   Loss 13.4643   LearningRate 0.0773   Epoch: 2   Global Step: 30040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:33:58,376-Speed 3044.31 samples/sec   Loss 13.4905   LearningRate 0.0773   Epoch: 2   Global Step: 30050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:01,770-Speed 3018.95 samples/sec   Loss 13.4316   LearningRate 0.0773   Epoch: 2   Global Step: 30060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:05,206-Speed 2980.53 samples/sec   Loss 13.3925   LearningRate 0.0773   Epoch: 2   Global Step: 30070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:08,516-Speed 3095.05 samples/sec   Loss 13.3680   LearningRate 0.0772   Epoch: 2   Global Step: 30080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:11,871-Speed 3053.26 samples/sec   Loss 13.5058   LearningRate 0.0772   Epoch: 2   Global Step: 30090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:15,203-Speed 3073.93 samples/sec   Loss 13.6540   LearningRate 0.0772   Epoch: 2   Global Step: 30100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:18,544-Speed 3065.83 samples/sec   Loss 13.5310   LearningRate 0.0772   Epoch: 2   Global Step: 30110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:21,883-Speed 3068.07 samples/sec   Loss 13.4579   LearningRate 0.0772   Epoch: 2   Global Step: 30120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:25,167-Speed 3119.53 samples/sec   Loss 13.5952   LearningRate 0.0772   Epoch: 2   Global Step: 30130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:28,463-Speed 3108.05 samples/sec   Loss 13.5606   LearningRate 0.0772   Epoch: 2   Global Step: 30140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:31,809-Speed 3060.60 samples/sec   Loss 13.4754   LearningRate 0.0772   Epoch: 2   Global Step: 30150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:35,088-Speed 3124.55 samples/sec   Loss 13.6887   LearningRate 0.0772   Epoch: 2   Global Step: 30160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:38,365-Speed 3125.35 samples/sec   Loss 13.6072   LearningRate 0.0772   Epoch: 2   Global Step: 30170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:41,668-Speed 3101.65 samples/sec   Loss 13.6134   LearningRate 0.0772   Epoch: 2   Global Step: 30180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:44,974-Speed 3097.51 samples/sec   Loss 13.7901   LearningRate 0.0772   Epoch: 2   Global Step: 30190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:48,314-Speed 3067.10 samples/sec   Loss 13.4991   LearningRate 0.0772   Epoch: 2   Global Step: 30200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:51,650-Speed 3070.73 samples/sec   Loss 13.3225   LearningRate 0.0772   Epoch: 2   Global Step: 30210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:54,985-Speed 3071.39 samples/sec   Loss 13.4219   LearningRate 0.0772   Epoch: 2   Global Step: 30220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:34:58,282-Speed 3107.45 samples/sec   Loss 13.5371   LearningRate 0.0771   Epoch: 2   Global Step: 30230   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 04:35:01,578-Speed 3106.99 samples/sec   Loss 13.3801   LearningRate 0.0771   Epoch: 2   Global Step: 30240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:35:04,893-Speed 3090.30 samples/sec   Loss 13.6691   LearningRate 0.0771   Epoch: 2   Global Step: 30250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:35:08,203-Speed 3094.10 samples/sec   Loss 13.4870   LearningRate 0.0771   Epoch: 2   Global Step: 30260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:35:11,524-Speed 3085.03 samples/sec   Loss 13.4271   LearningRate 0.0771   Epoch: 2   Global Step: 30270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:35:14,836-Speed 3092.61 samples/sec   Loss 13.6125   LearningRate 0.0771   Epoch: 2   Global Step: 30280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:35:18,111-Speed 3126.97 samples/sec   Loss 13.6459   LearningRate 0.0771   Epoch: 2   Global Step: 30290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:35:21,409-Speed 3106.72 samples/sec   Loss 13.4646   LearningRate 0.0771   Epoch: 2   Global Step: 30300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:35:24,705-Speed 3107.10 samples/sec   Loss 13.6533   LearningRate 0.0771   Epoch: 2   Global Step: 30310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:35:28,000-Speed 3108.52 samples/sec   Loss 13.4745   LearningRate 0.0771   Epoch: 2   Global Step: 30320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:35:31,300-Speed 3104.65 samples/sec   Loss 13.4194   LearningRate 0.0771   Epoch: 2   Global Step: 30330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:35:34,577-Speed 3125.67 samples/sec   Loss 13.5321   LearningRate 0.0771   Epoch: 2   Global Step: 30340   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:35:37,880-Speed 3100.32 samples/sec   Loss 13.4864   LearningRate 0.0771   Epoch: 2   Global Step: 30350   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:35:41,242-Speed 3046.44 samples/sec   Loss 13.3899   LearningRate 0.0771   Epoch: 2   Global Step: 30360   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:35:44,595-Speed 3055.51 samples/sec   Loss 13.5554   LearningRate 0.0770   Epoch: 2   Global Step: 30370   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:35:47,990-Speed 3017.06 samples/sec   Loss 13.5243   LearningRate 0.0770   Epoch: 2   Global Step: 30380   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:35:51,341-Speed 3057.27 samples/sec   Loss 13.6358   LearningRate 0.0770   Epoch: 2   Global Step: 30390   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:35:54,664-Speed 3081.63 samples/sec   Loss 13.4809   LearningRate 0.0770   Epoch: 2   Global Step: 30400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:35:58,022-Speed 3050.41 samples/sec   Loss 13.3779   LearningRate 0.0770   Epoch: 2   Global Step: 30410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:36:01,383-Speed 3047.35 samples/sec   Loss 13.5444   LearningRate 0.0770   Epoch: 2   Global Step: 30420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:36:04,824-Speed 2976.99 samples/sec   Loss 13.4681   LearningRate 0.0770   Epoch: 2   Global Step: 30430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:36:08,124-Speed 3104.70 samples/sec   Loss 13.5171   LearningRate 0.0770   Epoch: 2   Global Step: 30440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:36:11,421-Speed 3106.24 samples/sec   Loss 13.5270   LearningRate 0.0770   Epoch: 2   Global Step: 30450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:36:14,749-Speed 3077.45 samples/sec   Loss 13.5385   LearningRate 0.0770   Epoch: 2   Global Step: 30460   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:36:18,053-Speed 3101.35 samples/sec   Loss 13.5951   LearningRate 0.0770   Epoch: 2   Global Step: 30470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:36:21,380-Speed 3077.94 samples/sec   Loss 13.5470   LearningRate 0.0770   Epoch: 2   Global Step: 30480   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:36:24,640-Speed 3142.26 samples/sec   Loss 13.5981   LearningRate 0.0770   Epoch: 2   Global Step: 30490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:36:28,008-Speed 3040.93 samples/sec   Loss 13.5759   LearningRate 0.0770   Epoch: 2   Global Step: 30500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:36:31,312-Speed 3100.09 samples/sec   Loss 13.5353   LearningRate 0.0769   Epoch: 2   Global Step: 30510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:36:34,638-Speed 3080.28 samples/sec   Loss 13.5841   LearningRate 0.0769   Epoch: 2   Global Step: 30520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:36:37,947-Speed 3095.33 samples/sec   Loss 13.4099   LearningRate 0.0769   Epoch: 2   Global Step: 30530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:36:41,273-Speed 3078.87 samples/sec   Loss 13.5408   LearningRate 0.0769   Epoch: 2   Global Step: 30540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:36:44,542-Speed 3133.71 samples/sec   Loss 13.5174   LearningRate 0.0769   Epoch: 2   Global Step: 30550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:36:47,830-Speed 3115.10 samples/sec   Loss 13.4870   LearningRate 0.0769   Epoch: 2   Global Step: 30560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:36:51,100-Speed 3132.97 samples/sec   Loss 13.4365   LearningRate 0.0769   Epoch: 2   Global Step: 30570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:36:54,359-Speed 3142.41 samples/sec   Loss 13.4770   LearningRate 0.0769   Epoch: 2   Global Step: 30580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:36:57,671-Speed 3092.39 samples/sec   Loss 13.4171   LearningRate 0.0769   Epoch: 2   Global Step: 30590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:00,994-Speed 3082.88 samples/sec   Loss 13.5638   LearningRate 0.0769   Epoch: 2   Global Step: 30600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:04,280-Speed 3117.40 samples/sec   Loss 13.3821   LearningRate 0.0769   Epoch: 2   Global Step: 30610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:07,588-Speed 3096.43 samples/sec   Loss 13.3757   LearningRate 0.0769   Epoch: 2   Global Step: 30620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:10,866-Speed 3124.03 samples/sec   Loss 13.3019   LearningRate 0.0769   Epoch: 2   Global Step: 30630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:14,175-Speed 3095.12 samples/sec   Loss 13.6636   LearningRate 0.0769   Epoch: 2   Global Step: 30640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:17,464-Speed 3114.34 samples/sec   Loss 13.6339   LearningRate 0.0768   Epoch: 2   Global Step: 30650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:20,744-Speed 3123.22 samples/sec   Loss 13.5717   LearningRate 0.0768   Epoch: 2   Global Step: 30660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:24,082-Speed 3068.73 samples/sec   Loss 13.4007   LearningRate 0.0768   Epoch: 2   Global Step: 30670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:27,378-Speed 3107.99 samples/sec   Loss 13.5280   LearningRate 0.0768   Epoch: 2   Global Step: 30680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:30,750-Speed 3037.30 samples/sec   Loss 13.3607   LearningRate 0.0768   Epoch: 2   Global Step: 30690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:34,119-Speed 3040.91 samples/sec   Loss 13.5628   LearningRate 0.0768   Epoch: 2   Global Step: 30700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:37,424-Speed 3099.49 samples/sec   Loss 13.5927   LearningRate 0.0768   Epoch: 2   Global Step: 30710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:40,716-Speed 3111.17 samples/sec   Loss 13.3972   LearningRate 0.0768   Epoch: 2   Global Step: 30720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:44,014-Speed 3105.65 samples/sec   Loss 13.5299   LearningRate 0.0768   Epoch: 2   Global Step: 30730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:47,319-Speed 3099.54 samples/sec   Loss 13.3788   LearningRate 0.0768   Epoch: 2   Global Step: 30740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:50,649-Speed 3076.08 samples/sec   Loss 13.5657   LearningRate 0.0768   Epoch: 2   Global Step: 30750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:54,003-Speed 3054.18 samples/sec   Loss 13.4401   LearningRate 0.0768   Epoch: 2   Global Step: 30760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:37:57,337-Speed 3072.06 samples/sec   Loss 13.4708   LearningRate 0.0768   Epoch: 2   Global Step: 30770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:38:00,687-Speed 3057.14 samples/sec   Loss 13.5139   LearningRate 0.0768   Epoch: 2   Global Step: 30780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:38:04,016-Speed 3076.73 samples/sec   Loss 13.4310   LearningRate 0.0767   Epoch: 2   Global Step: 30790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:38:07,363-Speed 3061.08 samples/sec   Loss 13.4078   LearningRate 0.0767   Epoch: 2   Global Step: 30800   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:38:10,704-Speed 3065.81 samples/sec   Loss 13.5306   LearningRate 0.0767   Epoch: 2   Global Step: 30810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:38:14,054-Speed 3057.89 samples/sec   Loss 13.5867   LearningRate 0.0767   Epoch: 2   Global Step: 30820   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:38:17,413-Speed 3049.49 samples/sec   Loss 13.3574   LearningRate 0.0767   Epoch: 2   Global Step: 30830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:38:20,711-Speed 3105.70 samples/sec   Loss 13.4334   LearningRate 0.0767   Epoch: 2   Global Step: 30840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:38:24,024-Speed 3091.41 samples/sec   Loss 13.3404   LearningRate 0.0767   Epoch: 2   Global Step: 30850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:38:27,356-Speed 3074.39 samples/sec   Loss 13.6204   LearningRate 0.0767   Epoch: 2   Global Step: 30860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:38:30,682-Speed 3079.83 samples/sec   Loss 13.3963   LearningRate 0.0767   Epoch: 2   Global Step: 30870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:38:33,955-Speed 3129.82 samples/sec   Loss 13.4417   LearningRate 0.0767   Epoch: 2   Global Step: 30880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:38:37,263-Speed 3096.06 samples/sec   Loss 13.4320   LearningRate 0.0767   Epoch: 2   Global Step: 30890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:38:40,627-Speed 3044.89 samples/sec   Loss 13.5540   LearningRate 0.0767   Epoch: 2   Global Step: 30900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:38:43,945-Speed 3086.88 samples/sec   Loss 13.3726   LearningRate 0.0767   Epoch: 2   Global Step: 30910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:38:47,286-Speed 3065.73 samples/sec   Loss 13.4619   LearningRate 0.0767   Epoch: 2   Global Step: 30920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:38:50,662-Speed 3033.77 samples/sec   Loss 13.4389   LearningRate 0.0766   Epoch: 2   Global Step: 30930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:38:54,010-Speed 3059.80 samples/sec   Loss 13.6939   LearningRate 0.0766   Epoch: 2   Global Step: 30940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:38:57,331-Speed 3084.87 samples/sec   Loss 13.5801   LearningRate 0.0766   Epoch: 2   Global Step: 30950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:39:00,653-Speed 3083.21 samples/sec   Loss 13.2562   LearningRate 0.0766   Epoch: 2   Global Step: 30960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:39:04,052-Speed 3013.15 samples/sec   Loss 13.6125   LearningRate 0.0766   Epoch: 2   Global Step: 30970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:39:07,433-Speed 3029.60 samples/sec   Loss 13.3350   LearningRate 0.0766   Epoch: 2   Global Step: 30980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:39:10,743-Speed 3094.29 samples/sec   Loss 13.4758   LearningRate 0.0766   Epoch: 2   Global Step: 30990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:39:14,075-Speed 3074.50 samples/sec   Loss 13.3469   LearningRate 0.0766   Epoch: 2   Global Step: 31000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:39:17,409-Speed 3072.95 samples/sec   Loss 13.4119   LearningRate 0.0766   Epoch: 2   Global Step: 31010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:39:20,735-Speed 3079.73 samples/sec   Loss 13.5339   LearningRate 0.0766   Epoch: 2   Global Step: 31020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:39:24,073-Speed 3068.70 samples/sec   Loss 13.4494   LearningRate 0.0766   Epoch: 2   Global Step: 31030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:39:27,398-Speed 3080.07 samples/sec   Loss 13.4265   LearningRate 0.0766   Epoch: 2   Global Step: 31040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:39:30,739-Speed 3066.30 samples/sec   Loss 13.5099   LearningRate 0.0766   Epoch: 2   Global Step: 31050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:39:34,090-Speed 3056.53 samples/sec   Loss 13.4380   LearningRate 0.0766   Epoch: 2   Global Step: 31060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:39:37,424-Speed 3072.17 samples/sec   Loss 13.4083   LearningRate 0.0766   Epoch: 2   Global Step: 31070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:39:40,773-Speed 3058.66 samples/sec   Loss 13.4291   LearningRate 0.0765   Epoch: 2   Global Step: 31080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:39:44,142-Speed 3040.04 samples/sec   Loss 13.4630   LearningRate 0.0765   Epoch: 2   Global Step: 31090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:39:47,513-Speed 3038.86 samples/sec   Loss 13.4754   LearningRate 0.0765   Epoch: 2   Global Step: 31100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:39:50,854-Speed 3065.72 samples/sec   Loss 13.4215   LearningRate 0.0765   Epoch: 2   Global Step: 31110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:39:54,172-Speed 3087.36 samples/sec   Loss 13.5064   LearningRate 0.0765   Epoch: 2   Global Step: 31120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:39:57,514-Speed 3066.62 samples/sec   Loss 13.5161   LearningRate 0.0765   Epoch: 2   Global Step: 31130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:40:00,853-Speed 3067.14 samples/sec   Loss 13.5224   LearningRate 0.0765   Epoch: 2   Global Step: 31140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:40:04,159-Speed 3098.50 samples/sec   Loss 13.4345   LearningRate 0.0765   Epoch: 2   Global Step: 31150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:40:07,494-Speed 3071.95 samples/sec   Loss 13.4170   LearningRate 0.0765   Epoch: 2   Global Step: 31160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:40:10,749-Speed 3146.58 samples/sec   Loss 13.3311   LearningRate 0.0765   Epoch: 2   Global Step: 31170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:40:14,072-Speed 3081.75 samples/sec   Loss 13.3474   LearningRate 0.0765   Epoch: 2   Global Step: 31180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:40:17,371-Speed 3105.18 samples/sec   Loss 13.4607   LearningRate 0.0765   Epoch: 2   Global Step: 31190   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:40:20,720-Speed 3058.94 samples/sec   Loss 13.5553   LearningRate 0.0765   Epoch: 2   Global Step: 31200   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:40:24,049-Speed 3076.09 samples/sec   Loss 13.5768   LearningRate 0.0765   Epoch: 2   Global Step: 31210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:40:27,370-Speed 3084.82 samples/sec   Loss 13.5386   LearningRate 0.0764   Epoch: 2   Global Step: 31220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:40:30,695-Speed 3080.55 samples/sec   Loss 13.4436   LearningRate 0.0764   Epoch: 2   Global Step: 31230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:40:34,076-Speed 3029.55 samples/sec   Loss 13.4487   LearningRate 0.0764   Epoch: 2   Global Step: 31240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:40:37,414-Speed 3068.64 samples/sec   Loss 13.4645   LearningRate 0.0764   Epoch: 2   Global Step: 31250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:40:40,738-Speed 3081.58 samples/sec   Loss 13.2276   LearningRate 0.0764   Epoch: 2   Global Step: 31260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:40:44,046-Speed 3096.09 samples/sec   Loss 13.4734   LearningRate 0.0764   Epoch: 2   Global Step: 31270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:40:47,355-Speed 3095.44 samples/sec   Loss 13.2908   LearningRate 0.0764   Epoch: 2   Global Step: 31280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:40:50,711-Speed 3052.02 samples/sec   Loss 13.3224   LearningRate 0.0764   Epoch: 2   Global Step: 31290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:40:54,053-Speed 3064.84 samples/sec   Loss 13.5574   LearningRate 0.0764   Epoch: 2   Global Step: 31300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:40:57,482-Speed 2988.09 samples/sec   Loss 13.4098   LearningRate 0.0764   Epoch: 2   Global Step: 31310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:41:00,828-Speed 3060.73 samples/sec   Loss 13.6187   LearningRate 0.0764   Epoch: 2   Global Step: 31320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:41:04,099-Speed 3131.81 samples/sec   Loss 13.2752   LearningRate 0.0764   Epoch: 2   Global Step: 31330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:41:07,422-Speed 3082.26 samples/sec   Loss 13.4369   LearningRate 0.0764   Epoch: 2   Global Step: 31340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:41:10,747-Speed 3080.63 samples/sec   Loss 13.4517   LearningRate 0.0764   Epoch: 2   Global Step: 31350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:41:14,049-Speed 3102.34 samples/sec   Loss 13.4722   LearningRate 0.0763   Epoch: 2   Global Step: 31360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:41:17,382-Speed 3073.71 samples/sec   Loss 13.4252   LearningRate 0.0763   Epoch: 2   Global Step: 31370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:41:20,715-Speed 3073.29 samples/sec   Loss 13.4390   LearningRate 0.0763   Epoch: 2   Global Step: 31380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:41:24,062-Speed 3060.37 samples/sec   Loss 13.4766   LearningRate 0.0763   Epoch: 2   Global Step: 31390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:41:27,364-Speed 3101.67 samples/sec   Loss 13.3338   LearningRate 0.0763   Epoch: 2   Global Step: 31400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:41:30,640-Speed 3126.59 samples/sec   Loss 13.3974   LearningRate 0.0763   Epoch: 2   Global Step: 31410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:41:33,939-Speed 3104.80 samples/sec   Loss 13.3562   LearningRate 0.0763   Epoch: 2   Global Step: 31420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:41:37,236-Speed 3106.47 samples/sec   Loss 13.5917   LearningRate 0.0763   Epoch: 2   Global Step: 31430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:41:40,526-Speed 3112.93 samples/sec   Loss 13.3540   LearningRate 0.0763   Epoch: 2   Global Step: 31440   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:41:43,861-Speed 3071.79 samples/sec   Loss 13.3386   LearningRate 0.0763   Epoch: 2   Global Step: 31450   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:41:47,205-Speed 3063.59 samples/sec   Loss 13.4774   LearningRate 0.0763   Epoch: 2   Global Step: 31460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:41:50,478-Speed 3129.40 samples/sec   Loss 13.2701   LearningRate 0.0763   Epoch: 2   Global Step: 31470   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:41:53,771-Speed 3110.55 samples/sec   Loss 13.4248   LearningRate 0.0763   Epoch: 2   Global Step: 31480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 04:41:57,052-Speed 3121.87 samples/sec   Loss 13.3128   LearningRate 0.0763   Epoch: 2   Global Step: 31490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:42:00,371-Speed 3088.80 samples/sec   Loss 13.3343   LearningRate 0.0762   Epoch: 2   Global Step: 31500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:42:03,675-Speed 3100.25 samples/sec   Loss 13.3881   LearningRate 0.0762   Epoch: 2   Global Step: 31510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:42:07,004-Speed 3076.62 samples/sec   Loss 13.3343   LearningRate 0.0762   Epoch: 2   Global Step: 31520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:42:10,316-Speed 3092.77 samples/sec   Loss 13.3357   LearningRate 0.0762   Epoch: 2   Global Step: 31530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:42:13,590-Speed 3128.88 samples/sec   Loss 13.3361   LearningRate 0.0762   Epoch: 2   Global Step: 31540   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:42:16,923-Speed 3073.35 samples/sec   Loss 13.3159   LearningRate 0.0762   Epoch: 2   Global Step: 31550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:42:20,299-Speed 3034.09 samples/sec   Loss 13.3887   LearningRate 0.0762   Epoch: 2   Global Step: 31560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:42:23,621-Speed 3083.56 samples/sec   Loss 13.3568   LearningRate 0.0762   Epoch: 2   Global Step: 31570   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:42:26,905-Speed 3118.91 samples/sec   Loss 13.3530   LearningRate 0.0762   Epoch: 2   Global Step: 31580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:42:30,263-Speed 3050.13 samples/sec   Loss 13.3434   LearningRate 0.0762   Epoch: 2   Global Step: 31590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:42:33,590-Speed 3079.00 samples/sec   Loss 13.4376   LearningRate 0.0762   Epoch: 2   Global Step: 31600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:42:36,888-Speed 3106.36 samples/sec   Loss 13.3450   LearningRate 0.0762   Epoch: 2   Global Step: 31610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:42:40,172-Speed 3118.72 samples/sec   Loss 13.4780   LearningRate 0.0762   Epoch: 2   Global Step: 31620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:42:43,488-Speed 3089.29 samples/sec   Loss 13.5170   LearningRate 0.0762   Epoch: 2   Global Step: 31630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:42:46,795-Speed 3097.69 samples/sec   Loss 13.3914   LearningRate 0.0761   Epoch: 2   Global Step: 31640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:42:50,104-Speed 3095.39 samples/sec   Loss 13.2141   LearningRate 0.0761   Epoch: 2   Global Step: 31650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:42:53,420-Speed 3088.96 samples/sec   Loss 13.2043   LearningRate 0.0761   Epoch: 2   Global Step: 31660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:42:56,751-Speed 3075.09 samples/sec   Loss 13.3591   LearningRate 0.0761   Epoch: 2   Global Step: 31670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:43:00,032-Speed 3121.56 samples/sec   Loss 13.2684   LearningRate 0.0761   Epoch: 2   Global Step: 31680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:43:03,333-Speed 3103.51 samples/sec   Loss 13.4536   LearningRate 0.0761   Epoch: 2   Global Step: 31690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:43:06,666-Speed 3073.55 samples/sec   Loss 13.4442   LearningRate 0.0761   Epoch: 2   Global Step: 31700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:43:10,007-Speed 3065.34 samples/sec   Loss 13.5985   LearningRate 0.0761   Epoch: 2   Global Step: 31710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:43:13,299-Speed 3111.60 samples/sec   Loss 13.4507   LearningRate 0.0761   Epoch: 2   Global Step: 31720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:43:16,609-Speed 3094.46 samples/sec   Loss 13.3228   LearningRate 0.0761   Epoch: 2   Global Step: 31730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:43:19,953-Speed 3063.40 samples/sec   Loss 13.3683   LearningRate 0.0761   Epoch: 2   Global Step: 31740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:43:23,297-Speed 3063.37 samples/sec   Loss 13.3618   LearningRate 0.0761   Epoch: 2   Global Step: 31750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:43:26,682-Speed 3025.79 samples/sec   Loss 13.3550   LearningRate 0.0761   Epoch: 2   Global Step: 31760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:43:29,954-Speed 3130.18 samples/sec   Loss 13.5097   LearningRate 0.0761   Epoch: 2   Global Step: 31770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:43:33,341-Speed 3023.62 samples/sec   Loss 13.3509   LearningRate 0.0761   Epoch: 2   Global Step: 31780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:43:36,621-Speed 3123.08 samples/sec   Loss 13.4595   LearningRate 0.0760   Epoch: 2   Global Step: 31790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:43:39,967-Speed 3061.79 samples/sec   Loss 13.3864   LearningRate 0.0760   Epoch: 2   Global Step: 31800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:43:43,363-Speed 3015.99 samples/sec   Loss 13.2631   LearningRate 0.0760   Epoch: 2   Global Step: 31810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:43:46,711-Speed 3059.13 samples/sec   Loss 13.3045   LearningRate 0.0760   Epoch: 2   Global Step: 31820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:43:50,039-Speed 3078.43 samples/sec   Loss 13.1801   LearningRate 0.0760   Epoch: 2   Global Step: 31830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:43:53,349-Speed 3094.11 samples/sec   Loss 13.4373   LearningRate 0.0760   Epoch: 2   Global Step: 31840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:43:56,666-Speed 3088.62 samples/sec   Loss 13.3657   LearningRate 0.0760   Epoch: 2   Global Step: 31850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:43:59,970-Speed 3100.67 samples/sec   Loss 13.3814   LearningRate 0.0760   Epoch: 2   Global Step: 31860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:44:03,311-Speed 3065.52 samples/sec   Loss 13.4375   LearningRate 0.0760   Epoch: 2   Global Step: 31870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:44:06,646-Speed 3071.59 samples/sec   Loss 13.2931   LearningRate 0.0760   Epoch: 2   Global Step: 31880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:44:10,019-Speed 3036.83 samples/sec   Loss 13.3725   LearningRate 0.0760   Epoch: 2   Global Step: 31890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:44:13,309-Speed 3113.48 samples/sec   Loss 13.2024   LearningRate 0.0760   Epoch: 2   Global Step: 31900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:44:16,704-Speed 3016.81 samples/sec   Loss 13.4280   LearningRate 0.0760   Epoch: 2   Global Step: 31910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:44:19,977-Speed 3129.93 samples/sec   Loss 13.4858   LearningRate 0.0760   Epoch: 2   Global Step: 31920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:44:23,312-Speed 3070.46 samples/sec   Loss 13.4076   LearningRate 0.0759   Epoch: 2   Global Step: 31930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:44:26,635-Speed 3082.64 samples/sec   Loss 13.2960   LearningRate 0.0759   Epoch: 2   Global Step: 31940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:44:29,941-Speed 3098.78 samples/sec   Loss 13.3431   LearningRate 0.0759   Epoch: 2   Global Step: 31950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:44:33,216-Speed 3126.74 samples/sec   Loss 13.2922   LearningRate 0.0759   Epoch: 2   Global Step: 31960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:44:36,547-Speed 3076.08 samples/sec   Loss 13.4354   LearningRate 0.0759   Epoch: 2   Global Step: 31970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:44:39,860-Speed 3091.33 samples/sec   Loss 13.4107   LearningRate 0.0759   Epoch: 2   Global Step: 31980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:44:43,145-Speed 3117.59 samples/sec   Loss 13.3609   LearningRate 0.0759   Epoch: 2   Global Step: 31990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:44:46,449-Speed 3100.23 samples/sec   Loss 13.2766   LearningRate 0.0759   Epoch: 2   Global Step: 32000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:44:49,731-Speed 3120.88 samples/sec   Loss 13.3433   LearningRate 0.0759   Epoch: 2   Global Step: 32010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:44:53,026-Speed 3109.09 samples/sec   Loss 13.4065   LearningRate 0.0759   Epoch: 2   Global Step: 32020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:44:56,314-Speed 3115.44 samples/sec   Loss 13.4407   LearningRate 0.0759   Epoch: 2   Global Step: 32030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:44:59,659-Speed 3061.39 samples/sec   Loss 13.3698   LearningRate 0.0759   Epoch: 2   Global Step: 32040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:45:02,974-Speed 3090.00 samples/sec   Loss 13.3837   LearningRate 0.0759   Epoch: 2   Global Step: 32050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:45:06,290-Speed 3089.68 samples/sec   Loss 13.5212   LearningRate 0.0759   Epoch: 2   Global Step: 32060   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:45:09,622-Speed 3074.42 samples/sec   Loss 13.3323   LearningRate 0.0758   Epoch: 2   Global Step: 32070   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:45:12,913-Speed 3113.18 samples/sec   Loss 13.4049   LearningRate 0.0758   Epoch: 2   Global Step: 32080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:45:16,244-Speed 3074.83 samples/sec   Loss 13.1521   LearningRate 0.0758   Epoch: 2   Global Step: 32090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:45:19,601-Speed 3051.02 samples/sec   Loss 13.4845   LearningRate 0.0758   Epoch: 2   Global Step: 32100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:45:22,971-Speed 3039.76 samples/sec   Loss 13.4205   LearningRate 0.0758   Epoch: 2   Global Step: 32110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:45:26,331-Speed 3049.30 samples/sec   Loss 13.2009   LearningRate 0.0758   Epoch: 2   Global Step: 32120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:45:29,668-Speed 3068.82 samples/sec   Loss 13.3476   LearningRate 0.0758   Epoch: 2   Global Step: 32130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:45:33,032-Speed 3045.41 samples/sec   Loss 13.3135   LearningRate 0.0758   Epoch: 2   Global Step: 32140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:45:36,427-Speed 3016.76 samples/sec   Loss 13.3678   LearningRate 0.0758   Epoch: 2   Global Step: 32150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:45:39,735-Speed 3097.15 samples/sec   Loss 13.3103   LearningRate 0.0758   Epoch: 2   Global Step: 32160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:45:43,069-Speed 3072.24 samples/sec   Loss 13.3128   LearningRate 0.0758   Epoch: 2   Global Step: 32170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:45:46,386-Speed 3087.43 samples/sec   Loss 13.3189   LearningRate 0.0758   Epoch: 2   Global Step: 32180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:45:49,715-Speed 3077.10 samples/sec   Loss 13.5368   LearningRate 0.0758   Epoch: 2   Global Step: 32190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:45:53,066-Speed 3056.78 samples/sec   Loss 13.3019   LearningRate 0.0758   Epoch: 2   Global Step: 32200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:45:56,358-Speed 3111.75 samples/sec   Loss 13.2985   LearningRate 0.0757   Epoch: 2   Global Step: 32210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:45:59,691-Speed 3072.57 samples/sec   Loss 13.2087   LearningRate 0.0757   Epoch: 2   Global Step: 32220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:03,009-Speed 3086.98 samples/sec   Loss 13.3176   LearningRate 0.0757   Epoch: 2   Global Step: 32230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:06,327-Speed 3087.11 samples/sec   Loss 13.2457   LearningRate 0.0757   Epoch: 2   Global Step: 32240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:09,628-Speed 3103.50 samples/sec   Loss 13.3458   LearningRate 0.0757   Epoch: 2   Global Step: 32250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:12,995-Speed 3042.26 samples/sec   Loss 13.3388   LearningRate 0.0757   Epoch: 2   Global Step: 32260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:16,360-Speed 3043.76 samples/sec   Loss 13.3952   LearningRate 0.0757   Epoch: 2   Global Step: 32270   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:19,638-Speed 3124.67 samples/sec   Loss 13.4457   LearningRate 0.0757   Epoch: 2   Global Step: 32280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:23,000-Speed 3046.44 samples/sec   Loss 13.3057   LearningRate 0.0757   Epoch: 2   Global Step: 32290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:26,342-Speed 3065.41 samples/sec   Loss 13.3968   LearningRate 0.0757   Epoch: 2   Global Step: 32300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:29,732-Speed 3021.52 samples/sec   Loss 13.3266   LearningRate 0.0757   Epoch: 2   Global Step: 32310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:33,051-Speed 3086.29 samples/sec   Loss 13.3524   LearningRate 0.0757   Epoch: 2   Global Step: 32320   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:36,359-Speed 3096.99 samples/sec   Loss 13.3455   LearningRate 0.0757   Epoch: 2   Global Step: 32330   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:39,633-Speed 3127.73 samples/sec   Loss 13.2124   LearningRate 0.0757   Epoch: 2   Global Step: 32340   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:42,993-Speed 3049.12 samples/sec   Loss 13.3080   LearningRate 0.0757   Epoch: 2   Global Step: 32350   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:46,275-Speed 3120.32 samples/sec   Loss 13.3328   LearningRate 0.0756   Epoch: 2   Global Step: 32360   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:49,630-Speed 3053.26 samples/sec   Loss 13.1978   LearningRate 0.0756   Epoch: 2   Global Step: 32370   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:52,919-Speed 3114.51 samples/sec   Loss 13.3372   LearningRate 0.0756   Epoch: 2   Global Step: 32380   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:56,307-Speed 3023.35 samples/sec   Loss 13.2886   LearningRate 0.0756   Epoch: 2   Global Step: 32390   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:46:59,718-Speed 3002.61 samples/sec   Loss 13.2115   LearningRate 0.0756   Epoch: 2   Global Step: 32400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:02,996-Speed 3124.42 samples/sec   Loss 13.4155   LearningRate 0.0756   Epoch: 2   Global Step: 32410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:06,380-Speed 3027.59 samples/sec   Loss 13.2229   LearningRate 0.0756   Epoch: 2   Global Step: 32420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:09,722-Speed 3064.87 samples/sec   Loss 13.2516   LearningRate 0.0756   Epoch: 2   Global Step: 32430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:13,039-Speed 3087.99 samples/sec   Loss 13.3417   LearningRate 0.0756   Epoch: 2   Global Step: 32440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:16,332-Speed 3111.33 samples/sec   Loss 13.4043   LearningRate 0.0756   Epoch: 2   Global Step: 32450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:19,685-Speed 3054.08 samples/sec   Loss 13.3355   LearningRate 0.0756   Epoch: 2   Global Step: 32460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:22,988-Speed 3100.81 samples/sec   Loss 13.4790   LearningRate 0.0756   Epoch: 2   Global Step: 32470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:26,276-Speed 3115.96 samples/sec   Loss 13.3677   LearningRate 0.0756   Epoch: 2   Global Step: 32480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:29,650-Speed 3035.49 samples/sec   Loss 13.2601   LearningRate 0.0756   Epoch: 2   Global Step: 32490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:32,943-Speed 3110.27 samples/sec   Loss 13.2224   LearningRate 0.0755   Epoch: 2   Global Step: 32500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:36,321-Speed 3033.05 samples/sec   Loss 13.2935   LearningRate 0.0755   Epoch: 2   Global Step: 32510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:39,626-Speed 3099.31 samples/sec   Loss 13.3237   LearningRate 0.0755   Epoch: 2   Global Step: 32520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:43,002-Speed 3033.95 samples/sec   Loss 13.3345   LearningRate 0.0755   Epoch: 2   Global Step: 32530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:46,359-Speed 3051.87 samples/sec   Loss 13.3242   LearningRate 0.0755   Epoch: 2   Global Step: 32540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:49,694-Speed 3071.82 samples/sec   Loss 13.3677   LearningRate 0.0755   Epoch: 2   Global Step: 32550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:52,995-Speed 3102.77 samples/sec   Loss 13.2102   LearningRate 0.0755   Epoch: 2   Global Step: 32560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:56,374-Speed 3030.97 samples/sec   Loss 13.2676   LearningRate 0.0755   Epoch: 2   Global Step: 32570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:47:59,681-Speed 3096.94 samples/sec   Loss 13.2322   LearningRate 0.0755   Epoch: 2   Global Step: 32580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:03,059-Speed 3032.50 samples/sec   Loss 13.2750   LearningRate 0.0755   Epoch: 2   Global Step: 32590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:06,358-Speed 3105.84 samples/sec   Loss 13.2238   LearningRate 0.0755   Epoch: 2   Global Step: 32600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:09,622-Speed 3138.37 samples/sec   Loss 13.4179   LearningRate 0.0755   Epoch: 2   Global Step: 32610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:12,967-Speed 3061.72 samples/sec   Loss 13.2399   LearningRate 0.0755   Epoch: 2   Global Step: 32620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:16,283-Speed 3088.88 samples/sec   Loss 13.3454   LearningRate 0.0755   Epoch: 2   Global Step: 32630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:19,577-Speed 3109.92 samples/sec   Loss 13.3401   LearningRate 0.0754   Epoch: 2   Global Step: 32640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:22,857-Speed 3122.74 samples/sec   Loss 13.2853   LearningRate 0.0754   Epoch: 2   Global Step: 32650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:26,168-Speed 3094.11 samples/sec   Loss 13.1725   LearningRate 0.0754   Epoch: 2   Global Step: 32660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:29,440-Speed 3130.14 samples/sec   Loss 13.1905   LearningRate 0.0754   Epoch: 2   Global Step: 32670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:32,768-Speed 3077.74 samples/sec   Loss 13.1586   LearningRate 0.0754   Epoch: 2   Global Step: 32680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:36,127-Speed 3049.67 samples/sec   Loss 13.1417   LearningRate 0.0754   Epoch: 2   Global Step: 32690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:39,471-Speed 3063.12 samples/sec   Loss 13.3151   LearningRate 0.0754   Epoch: 2   Global Step: 32700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:42,790-Speed 3085.79 samples/sec   Loss 13.2426   LearningRate 0.0754   Epoch: 2   Global Step: 32710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:46,113-Speed 3082.68 samples/sec   Loss 13.3770   LearningRate 0.0754   Epoch: 2   Global Step: 32720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:49,415-Speed 3101.96 samples/sec   Loss 13.3610   LearningRate 0.0754   Epoch: 2   Global Step: 32730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:52,764-Speed 3058.18 samples/sec   Loss 13.3202   LearningRate 0.0754   Epoch: 2   Global Step: 32740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:48:56,076-Speed 3092.89 samples/sec   Loss 13.3338   LearningRate 0.0754   Epoch: 2   Global Step: 32750   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 04:48:59,413-Speed 3069.94 samples/sec   Loss 13.2868   LearningRate 0.0754   Epoch: 2   Global Step: 32760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:02,742-Speed 3076.61 samples/sec   Loss 13.2781   LearningRate 0.0754   Epoch: 2   Global Step: 32770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:06,083-Speed 3066.08 samples/sec   Loss 13.2765   LearningRate 0.0754   Epoch: 2   Global Step: 32780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:09,371-Speed 3115.33 samples/sec   Loss 13.2617   LearningRate 0.0753   Epoch: 2   Global Step: 32790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:12,676-Speed 3098.89 samples/sec   Loss 13.3201   LearningRate 0.0753   Epoch: 2   Global Step: 32800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:15,975-Speed 3104.89 samples/sec   Loss 13.3497   LearningRate 0.0753   Epoch: 2   Global Step: 32810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:19,324-Speed 3059.08 samples/sec   Loss 13.4070   LearningRate 0.0753   Epoch: 2   Global Step: 32820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:22,621-Speed 3106.72 samples/sec   Loss 13.2580   LearningRate 0.0753   Epoch: 2   Global Step: 32830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:25,963-Speed 3064.93 samples/sec   Loss 13.1695   LearningRate 0.0753   Epoch: 2   Global Step: 32840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:29,289-Speed 3079.63 samples/sec   Loss 13.2328   LearningRate 0.0753   Epoch: 2   Global Step: 32850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:32,581-Speed 3111.10 samples/sec   Loss 13.3581   LearningRate 0.0753   Epoch: 2   Global Step: 32860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:35,863-Speed 3120.82 samples/sec   Loss 13.2442   LearningRate 0.0753   Epoch: 2   Global Step: 32870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:39,211-Speed 3059.98 samples/sec   Loss 13.1008   LearningRate 0.0753   Epoch: 2   Global Step: 32880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:42,602-Speed 3020.61 samples/sec   Loss 13.3015   LearningRate 0.0753   Epoch: 2   Global Step: 32890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:45,926-Speed 3081.30 samples/sec   Loss 13.3068   LearningRate 0.0753   Epoch: 2   Global Step: 32900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:49,290-Speed 3044.85 samples/sec   Loss 13.1647   LearningRate 0.0753   Epoch: 2   Global Step: 32910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:52,623-Speed 3073.28 samples/sec   Loss 13.0607   LearningRate 0.0753   Epoch: 2   Global Step: 32920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:55,911-Speed 3115.06 samples/sec   Loss 13.2826   LearningRate 0.0752   Epoch: 2   Global Step: 32930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:49:59,247-Speed 3071.28 samples/sec   Loss 13.1621   LearningRate 0.0752   Epoch: 2   Global Step: 32940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:02,596-Speed 3058.13 samples/sec   Loss 13.0954   LearningRate 0.0752   Epoch: 2   Global Step: 32950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:05,896-Speed 3103.60 samples/sec   Loss 13.3313   LearningRate 0.0752   Epoch: 2   Global Step: 32960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:09,202-Speed 3099.45 samples/sec   Loss 13.1210   LearningRate 0.0752   Epoch: 2   Global Step: 32970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:12,523-Speed 3084.18 samples/sec   Loss 13.3512   LearningRate 0.0752   Epoch: 2   Global Step: 32980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:15,870-Speed 3060.65 samples/sec   Loss 13.3226   LearningRate 0.0752   Epoch: 2   Global Step: 32990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:19,181-Speed 3093.44 samples/sec   Loss 13.2293   LearningRate 0.0752   Epoch: 2   Global Step: 33000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:22,476-Speed 3108.75 samples/sec   Loss 13.2434   LearningRate 0.0752   Epoch: 2   Global Step: 33010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:25,812-Speed 3070.76 samples/sec   Loss 13.2305   LearningRate 0.0752   Epoch: 2   Global Step: 33020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:29,122-Speed 3094.12 samples/sec   Loss 13.3315   LearningRate 0.0752   Epoch: 2   Global Step: 33030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:32,536-Speed 3000.37 samples/sec   Loss 13.3109   LearningRate 0.0752   Epoch: 2   Global Step: 33040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:35,841-Speed 3099.15 samples/sec   Loss 13.2170   LearningRate 0.0752   Epoch: 2   Global Step: 33050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:39,164-Speed 3082.93 samples/sec   Loss 13.3055   LearningRate 0.0752   Epoch: 2   Global Step: 33060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:42,526-Speed 3046.19 samples/sec   Loss 13.3099   LearningRate 0.0751   Epoch: 2   Global Step: 33070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:45,932-Speed 3007.39 samples/sec   Loss 13.2807   LearningRate 0.0751   Epoch: 2   Global Step: 33080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:49,299-Speed 3042.09 samples/sec   Loss 13.2328   LearningRate 0.0751   Epoch: 2   Global Step: 33090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:52,687-Speed 3023.63 samples/sec   Loss 13.2692   LearningRate 0.0751   Epoch: 2   Global Step: 33100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:55,947-Speed 3142.55 samples/sec   Loss 13.0934   LearningRate 0.0751   Epoch: 2   Global Step: 33110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:50:59,203-Speed 3145.77 samples/sec   Loss 13.2274   LearningRate 0.0751   Epoch: 2   Global Step: 33120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:02,527-Speed 3081.57 samples/sec   Loss 13.1820   LearningRate 0.0751   Epoch: 2   Global Step: 33130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:05,867-Speed 3066.42 samples/sec   Loss 13.2117   LearningRate 0.0751   Epoch: 2   Global Step: 33140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:09,175-Speed 3096.91 samples/sec   Loss 13.1509   LearningRate 0.0751   Epoch: 2   Global Step: 33150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:12,484-Speed 3095.24 samples/sec   Loss 13.2397   LearningRate 0.0751   Epoch: 2   Global Step: 33160   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 04:51:15,863-Speed 3032.08 samples/sec   Loss 13.3193   LearningRate 0.0751   Epoch: 2   Global Step: 33170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:19,222-Speed 3049.67 samples/sec   Loss 13.3636   LearningRate 0.0751   Epoch: 2   Global Step: 33180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:22,567-Speed 3061.71 samples/sec   Loss 13.2348   LearningRate 0.0751   Epoch: 2   Global Step: 33190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:25,939-Speed 3038.33 samples/sec   Loss 13.3891   LearningRate 0.0751   Epoch: 2   Global Step: 33200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:29,291-Speed 3055.43 samples/sec   Loss 13.1850   LearningRate 0.0751   Epoch: 2   Global Step: 33210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:32,626-Speed 3071.92 samples/sec   Loss 13.0829   LearningRate 0.0750   Epoch: 2   Global Step: 33220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:35,940-Speed 3090.15 samples/sec   Loss 13.2218   LearningRate 0.0750   Epoch: 2   Global Step: 33230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:39,332-Speed 3019.95 samples/sec   Loss 13.2636   LearningRate 0.0750   Epoch: 2   Global Step: 33240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:42,684-Speed 3055.82 samples/sec   Loss 13.1769   LearningRate 0.0750   Epoch: 2   Global Step: 33250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:46,113-Speed 2986.83 samples/sec   Loss 13.3360   LearningRate 0.0750   Epoch: 2   Global Step: 33260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:49,465-Speed 3056.29 samples/sec   Loss 13.1496   LearningRate 0.0750   Epoch: 2   Global Step: 33270   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:52,808-Speed 3064.21 samples/sec   Loss 13.1823   LearningRate 0.0750   Epoch: 2   Global Step: 33280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:56,191-Speed 3027.35 samples/sec   Loss 13.1472   LearningRate 0.0750   Epoch: 2   Global Step: 33290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:51:59,549-Speed 3050.39 samples/sec   Loss 13.2135   LearningRate 0.0750   Epoch: 2   Global Step: 33300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:02,851-Speed 3102.55 samples/sec   Loss 13.1356   LearningRate 0.0750   Epoch: 2   Global Step: 33310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:06,209-Speed 3049.82 samples/sec   Loss 13.2167   LearningRate 0.0750   Epoch: 2   Global Step: 33320   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:09,486-Speed 3126.18 samples/sec   Loss 13.1417   LearningRate 0.0750   Epoch: 2   Global Step: 33330   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:12,860-Speed 3035.16 samples/sec   Loss 13.1100   LearningRate 0.0750   Epoch: 2   Global Step: 33340   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:16,208-Speed 3059.98 samples/sec   Loss 13.2857   LearningRate 0.0750   Epoch: 2   Global Step: 33350   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:19,515-Speed 3097.18 samples/sec   Loss 13.1521   LearningRate 0.0749   Epoch: 2   Global Step: 33360   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:22,837-Speed 3083.41 samples/sec   Loss 13.2718   LearningRate 0.0749   Epoch: 2   Global Step: 33370   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:26,199-Speed 3046.54 samples/sec   Loss 13.1951   LearningRate 0.0749   Epoch: 2   Global Step: 33380   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:29,471-Speed 3130.09 samples/sec   Loss 13.2699   LearningRate 0.0749   Epoch: 2   Global Step: 33390   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:32,783-Speed 3092.98 samples/sec   Loss 13.2999   LearningRate 0.0749   Epoch: 2   Global Step: 33400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:36,171-Speed 3023.16 samples/sec   Loss 13.2870   LearningRate 0.0749   Epoch: 2   Global Step: 33410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:39,558-Speed 3024.80 samples/sec   Loss 13.2108   LearningRate 0.0749   Epoch: 2   Global Step: 33420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:42,867-Speed 3094.65 samples/sec   Loss 13.1141   LearningRate 0.0749   Epoch: 2   Global Step: 33430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:46,199-Speed 3074.00 samples/sec   Loss 13.3297   LearningRate 0.0749   Epoch: 2   Global Step: 33440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:49,490-Speed 3113.46 samples/sec   Loss 13.2736   LearningRate 0.0749   Epoch: 2   Global Step: 33450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:52,822-Speed 3074.90 samples/sec   Loss 13.1716   LearningRate 0.0749   Epoch: 2   Global Step: 33460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:56,156-Speed 3072.78 samples/sec   Loss 13.2842   LearningRate 0.0749   Epoch: 2   Global Step: 33470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:52:59,485-Speed 3076.33 samples/sec   Loss 13.2464   LearningRate 0.0749   Epoch: 2   Global Step: 33480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:02,772-Speed 3116.37 samples/sec   Loss 13.1235   LearningRate 0.0749   Epoch: 2   Global Step: 33490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:06,081-Speed 3095.70 samples/sec   Loss 13.1912   LearningRate 0.0748   Epoch: 2   Global Step: 33500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:09,359-Speed 3124.36 samples/sec   Loss 13.0870   LearningRate 0.0748   Epoch: 2   Global Step: 33510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:12,688-Speed 3076.89 samples/sec   Loss 13.1832   LearningRate 0.0748   Epoch: 2   Global Step: 33520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:16,033-Speed 3062.14 samples/sec   Loss 13.3642   LearningRate 0.0748   Epoch: 2   Global Step: 33530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:19,375-Speed 3065.48 samples/sec   Loss 13.3165   LearningRate 0.0748   Epoch: 2   Global Step: 33540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:22,726-Speed 3056.60 samples/sec   Loss 13.1651   LearningRate 0.0748   Epoch: 2   Global Step: 33550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:26,032-Speed 3098.47 samples/sec   Loss 13.2866   LearningRate 0.0748   Epoch: 2   Global Step: 33560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:29,299-Speed 3134.99 samples/sec   Loss 13.1866   LearningRate 0.0748   Epoch: 2   Global Step: 33570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:32,641-Speed 3065.28 samples/sec   Loss 13.2455   LearningRate 0.0748   Epoch: 2   Global Step: 33580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:35,950-Speed 3095.09 samples/sec   Loss 13.2481   LearningRate 0.0748   Epoch: 2   Global Step: 33590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:39,274-Speed 3081.41 samples/sec   Loss 13.1498   LearningRate 0.0748   Epoch: 2   Global Step: 33600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:42,603-Speed 3077.59 samples/sec   Loss 13.2521   LearningRate 0.0748   Epoch: 2   Global Step: 33610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:45,972-Speed 3039.63 samples/sec   Loss 13.0684   LearningRate 0.0748   Epoch: 2   Global Step: 33620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:49,292-Speed 3085.86 samples/sec   Loss 13.1607   LearningRate 0.0748   Epoch: 2   Global Step: 33630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:52,648-Speed 3052.39 samples/sec   Loss 13.2167   LearningRate 0.0748   Epoch: 2   Global Step: 33640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:55,959-Speed 3093.66 samples/sec   Loss 13.1804   LearningRate 0.0747   Epoch: 2   Global Step: 33650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:53:59,314-Speed 3052.65 samples/sec   Loss 13.1942   LearningRate 0.0747   Epoch: 2   Global Step: 33660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:54:02,658-Speed 3063.04 samples/sec   Loss 13.1234   LearningRate 0.0747   Epoch: 2   Global Step: 33670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:54:05,994-Speed 3070.34 samples/sec   Loss 13.0450   LearningRate 0.0747   Epoch: 2   Global Step: 33680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:54:09,330-Speed 3071.89 samples/sec   Loss 13.0391   LearningRate 0.0747   Epoch: 2   Global Step: 33690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:54:12,621-Speed 3112.33 samples/sec   Loss 13.2307   LearningRate 0.0747   Epoch: 2   Global Step: 33700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:54:16,000-Speed 3030.75 samples/sec   Loss 13.2446   LearningRate 0.0747   Epoch: 2   Global Step: 33710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:54:19,409-Speed 3005.01 samples/sec   Loss 13.1229   LearningRate 0.0747   Epoch: 2   Global Step: 33720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:54:22,698-Speed 3114.49 samples/sec   Loss 13.2272   LearningRate 0.0747   Epoch: 2   Global Step: 33730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:54:26,080-Speed 3028.49 samples/sec   Loss 13.1860   LearningRate 0.0747   Epoch: 2   Global Step: 33740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:54:29,403-Speed 3082.96 samples/sec   Loss 13.0774   LearningRate 0.0747   Epoch: 2   Global Step: 33750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:54:32,751-Speed 3058.92 samples/sec   Loss 13.0600   LearningRate 0.0747   Epoch: 2   Global Step: 33760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:54:36,119-Speed 3041.43 samples/sec   Loss 13.2193   LearningRate 0.0747   Epoch: 2   Global Step: 33770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:54:39,522-Speed 3009.94 samples/sec   Loss 13.0191   LearningRate 0.0747   Epoch: 2   Global Step: 33780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:54:42,902-Speed 3030.90 samples/sec   Loss 13.2579   LearningRate 0.0746   Epoch: 2   Global Step: 33790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:54:46,295-Speed 3017.98 samples/sec   Loss 13.3087   LearningRate 0.0746   Epoch: 2   Global Step: 33800   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:54:49,592-Speed 3107.07 samples/sec   Loss 13.2166   LearningRate 0.0746   Epoch: 2   Global Step: 33810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:54:52,903-Speed 3093.72 samples/sec   Loss 13.2848   LearningRate 0.0746   Epoch: 2   Global Step: 33820   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:54:56,269-Speed 3043.49 samples/sec   Loss 13.1733   LearningRate 0.0746   Epoch: 2   Global Step: 33830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:54:59,573-Speed 3100.24 samples/sec   Loss 13.1510   LearningRate 0.0746   Epoch: 2   Global Step: 33840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:02,839-Speed 3135.94 samples/sec   Loss 13.2091   LearningRate 0.0746   Epoch: 2   Global Step: 33850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:06,139-Speed 3104.16 samples/sec   Loss 13.2473   LearningRate 0.0746   Epoch: 2   Global Step: 33860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:09,562-Speed 2992.17 samples/sec   Loss 13.1724   LearningRate 0.0746   Epoch: 2   Global Step: 33870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:12,858-Speed 3107.64 samples/sec   Loss 13.2324   LearningRate 0.0746   Epoch: 2   Global Step: 33880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:16,249-Speed 3020.66 samples/sec   Loss 13.2020   LearningRate 0.0746   Epoch: 2   Global Step: 33890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:19,592-Speed 3064.21 samples/sec   Loss 13.3491   LearningRate 0.0746   Epoch: 2   Global Step: 33900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:22,894-Speed 3102.36 samples/sec   Loss 13.1672   LearningRate 0.0746   Epoch: 2   Global Step: 33910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:26,283-Speed 3021.82 samples/sec   Loss 13.2964   LearningRate 0.0746   Epoch: 2   Global Step: 33920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:29,634-Speed 3057.62 samples/sec   Loss 13.2988   LearningRate 0.0745   Epoch: 2   Global Step: 33930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:32,994-Speed 3048.33 samples/sec   Loss 13.1517   LearningRate 0.0745   Epoch: 2   Global Step: 33940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:36,377-Speed 3027.59 samples/sec   Loss 13.1566   LearningRate 0.0745   Epoch: 2   Global Step: 33950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:39,772-Speed 3017.40 samples/sec   Loss 13.0291   LearningRate 0.0745   Epoch: 2   Global Step: 33960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:43,090-Speed 3086.74 samples/sec   Loss 13.0850   LearningRate 0.0745   Epoch: 2   Global Step: 33970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:46,441-Speed 3056.38 samples/sec   Loss 13.0778   LearningRate 0.0745   Epoch: 2   Global Step: 33980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:49,801-Speed 3048.80 samples/sec   Loss 13.2742   LearningRate 0.0745   Epoch: 2   Global Step: 33990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:53,152-Speed 3056.71 samples/sec   Loss 13.1367   LearningRate 0.0745   Epoch: 2   Global Step: 34000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:56,439-Speed 3116.27 samples/sec   Loss 13.0975   LearningRate 0.0745   Epoch: 2   Global Step: 34010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:55:59,773-Speed 3072.52 samples/sec   Loss 13.1594   LearningRate 0.0745   Epoch: 2   Global Step: 34020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:03,060-Speed 3116.02 samples/sec   Loss 13.2431   LearningRate 0.0745   Epoch: 2   Global Step: 34030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:06,379-Speed 3086.27 samples/sec   Loss 13.2245   LearningRate 0.0745   Epoch: 2   Global Step: 34040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:09,681-Speed 3102.91 samples/sec   Loss 13.1897   LearningRate 0.0745   Epoch: 2   Global Step: 34050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:13,014-Speed 3073.27 samples/sec   Loss 13.1750   LearningRate 0.0745   Epoch: 2   Global Step: 34060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:16,308-Speed 3109.55 samples/sec   Loss 13.1302   LearningRate 0.0745   Epoch: 2   Global Step: 34070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:19,646-Speed 3068.00 samples/sec   Loss 13.0939   LearningRate 0.0744   Epoch: 2   Global Step: 34080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:23,000-Speed 3054.12 samples/sec   Loss 13.2441   LearningRate 0.0744   Epoch: 2   Global Step: 34090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:26,326-Speed 3079.77 samples/sec   Loss 12.9704   LearningRate 0.0744   Epoch: 2   Global Step: 34100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:29,639-Speed 3092.38 samples/sec   Loss 13.2604   LearningRate 0.0744   Epoch: 2   Global Step: 34110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:32,961-Speed 3083.42 samples/sec   Loss 13.1384   LearningRate 0.0744   Epoch: 2   Global Step: 34120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:36,326-Speed 3044.13 samples/sec   Loss 13.4172   LearningRate 0.0744   Epoch: 2   Global Step: 34130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:39,651-Speed 3080.43 samples/sec   Loss 13.1556   LearningRate 0.0744   Epoch: 2   Global Step: 34140   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 04:56:42,936-Speed 3117.89 samples/sec   Loss 13.1813   LearningRate 0.0744   Epoch: 2   Global Step: 34150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:46,265-Speed 3076.68 samples/sec   Loss 13.1161   LearningRate 0.0744   Epoch: 2   Global Step: 34160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:49,603-Speed 3068.84 samples/sec   Loss 13.1920   LearningRate 0.0744   Epoch: 2   Global Step: 34170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:52,939-Speed 3070.84 samples/sec   Loss 13.1335   LearningRate 0.0744   Epoch: 2   Global Step: 34180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:56,278-Speed 3067.40 samples/sec   Loss 13.1989   LearningRate 0.0744   Epoch: 2   Global Step: 34190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:56:59,618-Speed 3067.78 samples/sec   Loss 13.1768   LearningRate 0.0744   Epoch: 2   Global Step: 34200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:57:03,044-Speed 2990.11 samples/sec   Loss 13.0319   LearningRate 0.0744   Epoch: 2   Global Step: 34210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:57:06,367-Speed 3082.43 samples/sec   Loss 13.1149   LearningRate 0.0743   Epoch: 2   Global Step: 34220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:57:09,711-Speed 3062.99 samples/sec   Loss 13.0723   LearningRate 0.0743   Epoch: 2   Global Step: 34230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:57:13,001-Speed 3113.44 samples/sec   Loss 13.0069   LearningRate 0.0743   Epoch: 2   Global Step: 34240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:57:16,315-Speed 3090.51 samples/sec   Loss 13.0198   LearningRate 0.0743   Epoch: 2   Global Step: 34250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:57:19,660-Speed 3062.55 samples/sec   Loss 12.8689   LearningRate 0.0743   Epoch: 2   Global Step: 34260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:57:22,950-Speed 3113.13 samples/sec   Loss 12.9774   LearningRate 0.0743   Epoch: 2   Global Step: 34270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:57:26,292-Speed 3065.10 samples/sec   Loss 13.3327   LearningRate 0.0743   Epoch: 2   Global Step: 34280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:57:29,603-Speed 3093.71 samples/sec   Loss 13.1194   LearningRate 0.0743   Epoch: 2   Global Step: 34290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:57:32,914-Speed 3093.46 samples/sec   Loss 13.2248   LearningRate 0.0743   Epoch: 2   Global Step: 34300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:57:36,202-Speed 3115.14 samples/sec   Loss 13.1331   LearningRate 0.0743   Epoch: 2   Global Step: 34310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:57:39,480-Speed 3125.55 samples/sec   Loss 13.1304   LearningRate 0.0743   Epoch: 2   Global Step: 34320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:57:42,759-Speed 3123.77 samples/sec   Loss 13.2620   LearningRate 0.0743   Epoch: 2   Global Step: 34330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:57:46,105-Speed 3061.21 samples/sec   Loss 13.0486   LearningRate 0.0743   Epoch: 2   Global Step: 34340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:57:49,405-Speed 3104.02 samples/sec   Loss 13.1535   LearningRate 0.0743   Epoch: 2   Global Step: 34350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:57:52,699-Speed 3109.02 samples/sec   Loss 13.1834   LearningRate 0.0743   Epoch: 2   Global Step: 34360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:57:55,999-Speed 3104.40 samples/sec   Loss 13.1164   LearningRate 0.0742   Epoch: 2   Global Step: 34370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:57:59,346-Speed 3060.14 samples/sec   Loss 13.1539   LearningRate 0.0742   Epoch: 2   Global Step: 34380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:58:02,678-Speed 3074.27 samples/sec   Loss 13.1674   LearningRate 0.0742   Epoch: 2   Global Step: 34390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:58:05,977-Speed 3105.53 samples/sec   Loss 12.9516   LearningRate 0.0742   Epoch: 2   Global Step: 34400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:58:09,324-Speed 3059.51 samples/sec   Loss 13.0487   LearningRate 0.0742   Epoch: 2   Global Step: 34410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:58:12,628-Speed 3100.27 samples/sec   Loss 13.2246   LearningRate 0.0742   Epoch: 2   Global Step: 34420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:58:15,967-Speed 3067.22 samples/sec   Loss 13.1338   LearningRate 0.0742   Epoch: 2   Global Step: 34430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:58:19,290-Speed 3082.95 samples/sec   Loss 13.2509   LearningRate 0.0742   Epoch: 2   Global Step: 34440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 04:58:22,714-Speed 2991.07 samples/sec   Loss 13.0597   LearningRate 0.0742   Epoch: 2   Global Step: 34450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:58:26,084-Speed 3039.95 samples/sec   Loss 13.0987   LearningRate 0.0742   Epoch: 2   Global Step: 34460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:58:29,428-Speed 3062.76 samples/sec   Loss 13.2625   LearningRate 0.0742   Epoch: 2   Global Step: 34470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:58:32,752-Speed 3081.24 samples/sec   Loss 13.1655   LearningRate 0.0742   Epoch: 2   Global Step: 34480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:58:36,097-Speed 3063.21 samples/sec   Loss 12.9619   LearningRate 0.0742   Epoch: 2   Global Step: 34490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:58:39,461-Speed 3045.14 samples/sec   Loss 12.9639   LearningRate 0.0742   Epoch: 2   Global Step: 34500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:58:42,781-Speed 3084.93 samples/sec   Loss 12.9820   LearningRate 0.0741   Epoch: 2   Global Step: 34510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:58:46,084-Speed 3100.96 samples/sec   Loss 13.1020   LearningRate 0.0741   Epoch: 2   Global Step: 34520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:58:49,413-Speed 3077.22 samples/sec   Loss 13.0925   LearningRate 0.0741   Epoch: 2   Global Step: 34530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:58:52,745-Speed 3073.80 samples/sec   Loss 13.1557   LearningRate 0.0741   Epoch: 2   Global Step: 34540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:58:56,016-Speed 3131.72 samples/sec   Loss 12.9970   LearningRate 0.0741   Epoch: 2   Global Step: 34550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:58:59,311-Speed 3110.41 samples/sec   Loss 13.0702   LearningRate 0.0741   Epoch: 2   Global Step: 34560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:02,605-Speed 3109.50 samples/sec   Loss 12.9972   LearningRate 0.0741   Epoch: 2   Global Step: 34570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:05,871-Speed 3137.28 samples/sec   Loss 13.1046   LearningRate 0.0741   Epoch: 2   Global Step: 34580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:09,175-Speed 3099.79 samples/sec   Loss 13.2619   LearningRate 0.0741   Epoch: 2   Global Step: 34590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:12,503-Speed 3077.42 samples/sec   Loss 13.1754   LearningRate 0.0741   Epoch: 2   Global Step: 34600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:15,818-Speed 3090.41 samples/sec   Loss 13.1520   LearningRate 0.0741   Epoch: 2   Global Step: 34610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:19,172-Speed 3054.11 samples/sec   Loss 13.0169   LearningRate 0.0741   Epoch: 2   Global Step: 34620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:22,483-Speed 3093.91 samples/sec   Loss 13.0646   LearningRate 0.0741   Epoch: 2   Global Step: 34630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:25,752-Speed 3133.01 samples/sec   Loss 13.1495   LearningRate 0.0741   Epoch: 2   Global Step: 34640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:29,034-Speed 3120.64 samples/sec   Loss 13.2383   LearningRate 0.0740   Epoch: 2   Global Step: 34650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:32,363-Speed 3077.53 samples/sec   Loss 12.9846   LearningRate 0.0740   Epoch: 2   Global Step: 34660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:35,657-Speed 3109.43 samples/sec   Loss 13.0861   LearningRate 0.0740   Epoch: 2   Global Step: 34670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:38,971-Speed 3090.86 samples/sec   Loss 13.2861   LearningRate 0.0740   Epoch: 2   Global Step: 34680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:42,319-Speed 3059.28 samples/sec   Loss 13.1531   LearningRate 0.0740   Epoch: 2   Global Step: 34690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:45,585-Speed 3136.30 samples/sec   Loss 13.1684   LearningRate 0.0740   Epoch: 2   Global Step: 34700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:48,938-Speed 3054.45 samples/sec   Loss 13.1490   LearningRate 0.0740   Epoch: 2   Global Step: 34710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:52,278-Speed 3067.55 samples/sec   Loss 13.0548   LearningRate 0.0740   Epoch: 2   Global Step: 34720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:55,607-Speed 3076.45 samples/sec   Loss 13.1122   LearningRate 0.0740   Epoch: 2   Global Step: 34730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 04:59:58,936-Speed 3076.94 samples/sec   Loss 13.1256   LearningRate 0.0740   Epoch: 2   Global Step: 34740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:00:02,278-Speed 3064.95 samples/sec   Loss 13.0874   LearningRate 0.0740   Epoch: 2   Global Step: 34750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:00:05,551-Speed 3130.21 samples/sec   Loss 13.1554   LearningRate 0.0740   Epoch: 2   Global Step: 34760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:00:08,895-Speed 3062.50 samples/sec   Loss 13.0342   LearningRate 0.0740   Epoch: 2   Global Step: 34770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:00:12,227-Speed 3074.53 samples/sec   Loss 13.2825   LearningRate 0.0740   Epoch: 2   Global Step: 34780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:00:15,551-Speed 3081.47 samples/sec   Loss 13.0297   LearningRate 0.0740   Epoch: 2   Global Step: 34790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:00:18,858-Speed 3097.62 samples/sec   Loss 13.1795   LearningRate 0.0739   Epoch: 2   Global Step: 34800   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:00:22,248-Speed 3021.18 samples/sec   Loss 12.9859   LearningRate 0.0739   Epoch: 2   Global Step: 34810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:00:25,563-Speed 3090.65 samples/sec   Loss 13.2028   LearningRate 0.0739   Epoch: 2   Global Step: 34820   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:00:28,903-Speed 3065.91 samples/sec   Loss 13.2642   LearningRate 0.0739   Epoch: 2   Global Step: 34830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:00:32,220-Speed 3088.75 samples/sec   Loss 13.0566   LearningRate 0.0739   Epoch: 2   Global Step: 34840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:00:35,517-Speed 3106.87 samples/sec   Loss 13.0053   LearningRate 0.0739   Epoch: 2   Global Step: 34850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:00:38,825-Speed 3097.04 samples/sec   Loss 13.0329   LearningRate 0.0739   Epoch: 2   Global Step: 34860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:00:42,103-Speed 3124.53 samples/sec   Loss 13.1280   LearningRate 0.0739   Epoch: 2   Global Step: 34870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:00:45,416-Speed 3091.72 samples/sec   Loss 13.2612   LearningRate 0.0739   Epoch: 2   Global Step: 34880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:00:48,719-Speed 3100.58 samples/sec   Loss 13.1161   LearningRate 0.0739   Epoch: 2   Global Step: 34890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:00:52,065-Speed 3061.53 samples/sec   Loss 13.1564   LearningRate 0.0739   Epoch: 2   Global Step: 34900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:00:55,443-Speed 3033.97 samples/sec   Loss 13.0988   LearningRate 0.0739   Epoch: 2   Global Step: 34910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:00:58,724-Speed 3121.60 samples/sec   Loss 13.0019   LearningRate 0.0739   Epoch: 2   Global Step: 34920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:02,043-Speed 3086.62 samples/sec   Loss 13.2048   LearningRate 0.0739   Epoch: 2   Global Step: 34930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:05,326-Speed 3119.45 samples/sec   Loss 12.9974   LearningRate 0.0738   Epoch: 2   Global Step: 34940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:08,664-Speed 3069.28 samples/sec   Loss 13.1391   LearningRate 0.0738   Epoch: 2   Global Step: 34950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:11,971-Speed 3096.73 samples/sec   Loss 13.0206   LearningRate 0.0738   Epoch: 2   Global Step: 34960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:15,307-Speed 3071.14 samples/sec   Loss 13.2023   LearningRate 0.0738   Epoch: 2   Global Step: 34970   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 05:01:18,606-Speed 3104.13 samples/sec   Loss 13.0538   LearningRate 0.0738   Epoch: 2   Global Step: 34980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:21,956-Speed 3058.35 samples/sec   Loss 13.1141   LearningRate 0.0738   Epoch: 2   Global Step: 34990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:25,286-Speed 3075.11 samples/sec   Loss 13.2038   LearningRate 0.0738   Epoch: 2   Global Step: 35000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:28,574-Speed 3116.44 samples/sec   Loss 13.0730   LearningRate 0.0738   Epoch: 2   Global Step: 35010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:31,923-Speed 3057.78 samples/sec   Loss 13.0248   LearningRate 0.0738   Epoch: 2   Global Step: 35020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:35,232-Speed 3095.29 samples/sec   Loss 12.9567   LearningRate 0.0738   Epoch: 2   Global Step: 35030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:38,606-Speed 3036.17 samples/sec   Loss 13.0753   LearningRate 0.0738   Epoch: 2   Global Step: 35040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:41,926-Speed 3085.15 samples/sec   Loss 13.0422   LearningRate 0.0738   Epoch: 2   Global Step: 35050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:45,241-Speed 3090.27 samples/sec   Loss 13.0883   LearningRate 0.0738   Epoch: 2   Global Step: 35060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:48,590-Speed 3058.67 samples/sec   Loss 12.9390   LearningRate 0.0738   Epoch: 2   Global Step: 35070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:51,920-Speed 3076.84 samples/sec   Loss 13.0201   LearningRate 0.0738   Epoch: 2   Global Step: 35080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:55,288-Speed 3041.20 samples/sec   Loss 12.9734   LearningRate 0.0737   Epoch: 2   Global Step: 35090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:01:58,611-Speed 3082.16 samples/sec   Loss 13.2068   LearningRate 0.0737   Epoch: 2   Global Step: 35100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:01,889-Speed 3125.00 samples/sec   Loss 13.1131   LearningRate 0.0737   Epoch: 2   Global Step: 35110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:05,230-Speed 3065.91 samples/sec   Loss 12.9593   LearningRate 0.0737   Epoch: 2   Global Step: 35120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:08,564-Speed 3071.34 samples/sec   Loss 13.1215   LearningRate 0.0737   Epoch: 2   Global Step: 35130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:11,945-Speed 3029.88 samples/sec   Loss 12.9352   LearningRate 0.0737   Epoch: 2   Global Step: 35140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:15,344-Speed 3013.85 samples/sec   Loss 13.0940   LearningRate 0.0737   Epoch: 2   Global Step: 35150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:18,698-Speed 3053.32 samples/sec   Loss 12.9241   LearningRate 0.0737   Epoch: 2   Global Step: 35160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:21,984-Speed 3117.55 samples/sec   Loss 13.0949   LearningRate 0.0737   Epoch: 2   Global Step: 35170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:25,348-Speed 3045.55 samples/sec   Loss 13.0184   LearningRate 0.0737   Epoch: 2   Global Step: 35180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:28,685-Speed 3069.24 samples/sec   Loss 13.2305   LearningRate 0.0737   Epoch: 2   Global Step: 35190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:32,019-Speed 3071.96 samples/sec   Loss 12.9642   LearningRate 0.0737   Epoch: 2   Global Step: 35200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:35,349-Speed 3076.59 samples/sec   Loss 12.9550   LearningRate 0.0737   Epoch: 2   Global Step: 35210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:38,649-Speed 3103.64 samples/sec   Loss 12.9185   LearningRate 0.0737   Epoch: 2   Global Step: 35220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:41,964-Speed 3089.65 samples/sec   Loss 13.1760   LearningRate 0.0736   Epoch: 2   Global Step: 35230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:45,296-Speed 3074.89 samples/sec   Loss 13.0833   LearningRate 0.0736   Epoch: 2   Global Step: 35240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:48,605-Speed 3095.50 samples/sec   Loss 13.0108   LearningRate 0.0736   Epoch: 2   Global Step: 35250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:51,920-Speed 3089.26 samples/sec   Loss 12.9675   LearningRate 0.0736   Epoch: 2   Global Step: 35260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:55,212-Speed 3111.87 samples/sec   Loss 13.0593   LearningRate 0.0736   Epoch: 2   Global Step: 35270   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:02:58,539-Speed 3078.99 samples/sec   Loss 12.9757   LearningRate 0.0736   Epoch: 2   Global Step: 35280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:03:01,836-Speed 3106.22 samples/sec   Loss 13.0680   LearningRate 0.0736   Epoch: 2   Global Step: 35290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:03:05,149-Speed 3091.86 samples/sec   Loss 12.9848   LearningRate 0.0736   Epoch: 2   Global Step: 35300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:03:08,417-Speed 3134.36 samples/sec   Loss 13.0243   LearningRate 0.0736   Epoch: 2   Global Step: 35310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:03:11,728-Speed 3093.26 samples/sec   Loss 13.0505   LearningRate 0.0736   Epoch: 2   Global Step: 35320   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:03:15,067-Speed 3068.48 samples/sec   Loss 12.9987   LearningRate 0.0736   Epoch: 2   Global Step: 35330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:03:18,403-Speed 3069.68 samples/sec   Loss 13.0375   LearningRate 0.0736   Epoch: 2   Global Step: 35340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:03:21,776-Speed 3037.01 samples/sec   Loss 13.0925   LearningRate 0.0736   Epoch: 2   Global Step: 35350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:03:25,052-Speed 3126.64 samples/sec   Loss 13.1516   LearningRate 0.0736   Epoch: 2   Global Step: 35360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:03:28,338-Speed 3117.42 samples/sec   Loss 12.9971   LearningRate 0.0736   Epoch: 2   Global Step: 35370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:03:31,664-Speed 3079.59 samples/sec   Loss 13.0410   LearningRate 0.0735   Epoch: 2   Global Step: 35380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:03:34,971-Speed 3097.59 samples/sec   Loss 13.0062   LearningRate 0.0735   Epoch: 2   Global Step: 35390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:03:38,270-Speed 3104.49 samples/sec   Loss 13.0327   LearningRate 0.0735   Epoch: 2   Global Step: 35400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:03:41,571-Speed 3102.99 samples/sec   Loss 12.9810   LearningRate 0.0735   Epoch: 2   Global Step: 35410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:03:44,857-Speed 3118.21 samples/sec   Loss 13.0263   LearningRate 0.0735   Epoch: 2   Global Step: 35420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:03:48,170-Speed 3091.06 samples/sec   Loss 13.0954   LearningRate 0.0735   Epoch: 2   Global Step: 35430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:03:51,466-Speed 3107.82 samples/sec   Loss 12.9053   LearningRate 0.0735   Epoch: 2   Global Step: 35440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:03:54,808-Speed 3065.30 samples/sec   Loss 12.9550   LearningRate 0.0735   Epoch: 2   Global Step: 35450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:03:58,114-Speed 3097.75 samples/sec   Loss 13.2120   LearningRate 0.0735   Epoch: 2   Global Step: 35460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:01,423-Speed 3095.61 samples/sec   Loss 13.1242   LearningRate 0.0735   Epoch: 2   Global Step: 35470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:04,802-Speed 3034.94 samples/sec   Loss 12.9201   LearningRate 0.0735   Epoch: 2   Global Step: 35480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:08,134-Speed 3073.38 samples/sec   Loss 13.0782   LearningRate 0.0735   Epoch: 2   Global Step: 35490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:11,438-Speed 3100.12 samples/sec   Loss 12.9077   LearningRate 0.0735   Epoch: 2   Global Step: 35500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:14,786-Speed 3059.93 samples/sec   Loss 13.0249   LearningRate 0.0735   Epoch: 2   Global Step: 35510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:18,166-Speed 3030.80 samples/sec   Loss 13.0155   LearningRate 0.0734   Epoch: 2   Global Step: 35520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:21,422-Speed 3146.43 samples/sec   Loss 13.2444   LearningRate 0.0734   Epoch: 2   Global Step: 35530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:24,704-Speed 3120.57 samples/sec   Loss 13.0738   LearningRate 0.0734   Epoch: 2   Global Step: 35540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:28,011-Speed 3097.39 samples/sec   Loss 12.9534   LearningRate 0.0734   Epoch: 2   Global Step: 35550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:31,275-Speed 3138.49 samples/sec   Loss 12.9029   LearningRate 0.0734   Epoch: 2   Global Step: 35560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:34,557-Speed 3121.13 samples/sec   Loss 12.9086   LearningRate 0.0734   Epoch: 2   Global Step: 35570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:37,822-Speed 3137.25 samples/sec   Loss 12.8643   LearningRate 0.0734   Epoch: 2   Global Step: 35580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:41,120-Speed 3106.30 samples/sec   Loss 13.0320   LearningRate 0.0734   Epoch: 2   Global Step: 35590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:44,386-Speed 3135.46 samples/sec   Loss 12.9854   LearningRate 0.0734   Epoch: 2   Global Step: 35600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:47,702-Speed 3089.99 samples/sec   Loss 13.0077   LearningRate 0.0734   Epoch: 2   Global Step: 35610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:50,979-Speed 3125.07 samples/sec   Loss 13.0629   LearningRate 0.0734   Epoch: 2   Global Step: 35620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:54,273-Speed 3110.52 samples/sec   Loss 12.9643   LearningRate 0.0734   Epoch: 2   Global Step: 35630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:04:57,618-Speed 3061.58 samples/sec   Loss 13.0042   LearningRate 0.0734   Epoch: 2   Global Step: 35640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:00,941-Speed 3082.79 samples/sec   Loss 12.8403   LearningRate 0.0734   Epoch: 2   Global Step: 35650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:04,327-Speed 3024.68 samples/sec   Loss 13.0366   LearningRate 0.0734   Epoch: 2   Global Step: 35660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:07,638-Speed 3093.41 samples/sec   Loss 12.8554   LearningRate 0.0733   Epoch: 2   Global Step: 35670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:10,990-Speed 3056.38 samples/sec   Loss 12.9044   LearningRate 0.0733   Epoch: 2   Global Step: 35680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:14,387-Speed 3015.11 samples/sec   Loss 13.0877   LearningRate 0.0733   Epoch: 2   Global Step: 35690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:17,698-Speed 3094.09 samples/sec   Loss 12.9246   LearningRate 0.0733   Epoch: 2   Global Step: 35700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:21,027-Speed 3076.39 samples/sec   Loss 12.9905   LearningRate 0.0733   Epoch: 2   Global Step: 35710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:24,436-Speed 3005.35 samples/sec   Loss 12.9176   LearningRate 0.0733   Epoch: 2   Global Step: 35720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:27,707-Speed 3130.89 samples/sec   Loss 12.9838   LearningRate 0.0733   Epoch: 2   Global Step: 35730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:30,978-Speed 3131.41 samples/sec   Loss 12.9956   LearningRate 0.0733   Epoch: 2   Global Step: 35740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:34,297-Speed 3087.07 samples/sec   Loss 12.9231   LearningRate 0.0733   Epoch: 2   Global Step: 35750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:37,607-Speed 3094.79 samples/sec   Loss 13.0684   LearningRate 0.0733   Epoch: 2   Global Step: 35760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:40,999-Speed 3019.33 samples/sec   Loss 13.1908   LearningRate 0.0733   Epoch: 2   Global Step: 35770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:44,337-Speed 3068.95 samples/sec   Loss 13.0441   LearningRate 0.0733   Epoch: 2   Global Step: 35780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:47,726-Speed 3021.78 samples/sec   Loss 13.0811   LearningRate 0.0733   Epoch: 2   Global Step: 35790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:51,067-Speed 3066.00 samples/sec   Loss 12.8813   LearningRate 0.0733   Epoch: 2   Global Step: 35800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:54,379-Speed 3093.19 samples/sec   Loss 12.9781   LearningRate 0.0732   Epoch: 2   Global Step: 35810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:05:57,723-Speed 3062.72 samples/sec   Loss 13.0098   LearningRate 0.0732   Epoch: 2   Global Step: 35820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:01,016-Speed 3111.24 samples/sec   Loss 13.1611   LearningRate 0.0732   Epoch: 2   Global Step: 35830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:04,374-Speed 3049.60 samples/sec   Loss 13.1241   LearningRate 0.0732   Epoch: 2   Global Step: 35840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:07,688-Speed 3091.46 samples/sec   Loss 12.9969   LearningRate 0.0732   Epoch: 2   Global Step: 35850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:10,993-Speed 3099.43 samples/sec   Loss 12.9535   LearningRate 0.0732   Epoch: 2   Global Step: 35860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:14,295-Speed 3102.24 samples/sec   Loss 13.0856   LearningRate 0.0732   Epoch: 2   Global Step: 35870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:17,621-Speed 3079.09 samples/sec   Loss 12.8756   LearningRate 0.0732   Epoch: 2   Global Step: 35880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:20,944-Speed 3083.32 samples/sec   Loss 13.1147   LearningRate 0.0732   Epoch: 2   Global Step: 35890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:24,274-Speed 3076.07 samples/sec   Loss 12.9846   LearningRate 0.0732   Epoch: 2   Global Step: 35900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:27,582-Speed 3096.04 samples/sec   Loss 12.9816   LearningRate 0.0732   Epoch: 2   Global Step: 35910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:30,854-Speed 3133.26 samples/sec   Loss 12.9605   LearningRate 0.0732   Epoch: 2   Global Step: 35920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:34,191-Speed 3069.12 samples/sec   Loss 12.9396   LearningRate 0.0732   Epoch: 2   Global Step: 35930   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 05:06:37,510-Speed 3086.40 samples/sec   Loss 12.8718   LearningRate 0.0732   Epoch: 2   Global Step: 35940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:40,827-Speed 3088.54 samples/sec   Loss 12.9024   LearningRate 0.0732   Epoch: 2   Global Step: 35950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:44,252-Speed 2990.67 samples/sec   Loss 13.0007   LearningRate 0.0731   Epoch: 2   Global Step: 35960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:47,626-Speed 3036.01 samples/sec   Loss 13.0210   LearningRate 0.0731   Epoch: 2   Global Step: 35970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:50,999-Speed 3036.52 samples/sec   Loss 13.0151   LearningRate 0.0731   Epoch: 2   Global Step: 35980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:54,280-Speed 3122.43 samples/sec   Loss 13.0522   LearningRate 0.0731   Epoch: 2   Global Step: 35990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:06:57,621-Speed 3066.05 samples/sec   Loss 13.0137   LearningRate 0.0731   Epoch: 2   Global Step: 36000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:01,009-Speed 3023.27 samples/sec   Loss 12.8956   LearningRate 0.0731   Epoch: 2   Global Step: 36010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:04,337-Speed 3078.66 samples/sec   Loss 13.1117   LearningRate 0.0731   Epoch: 2   Global Step: 36020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:07,643-Speed 3098.35 samples/sec   Loss 12.9829   LearningRate 0.0731   Epoch: 2   Global Step: 36030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:10,946-Speed 3101.50 samples/sec   Loss 12.9416   LearningRate 0.0731   Epoch: 2   Global Step: 36040   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 05:07:14,251-Speed 3099.52 samples/sec   Loss 13.0614   LearningRate 0.0731   Epoch: 2   Global Step: 36050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:17,560-Speed 3095.24 samples/sec   Loss 13.0021   LearningRate 0.0731   Epoch: 2   Global Step: 36060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:20,897-Speed 3069.50 samples/sec   Loss 12.9547   LearningRate 0.0731   Epoch: 2   Global Step: 36070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:24,208-Speed 3093.60 samples/sec   Loss 12.8756   LearningRate 0.0731   Epoch: 2   Global Step: 36080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:27,526-Speed 3087.91 samples/sec   Loss 12.8037   LearningRate 0.0731   Epoch: 2   Global Step: 36090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:30,823-Speed 3106.28 samples/sec   Loss 12.9073   LearningRate 0.0730   Epoch: 2   Global Step: 36100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:34,096-Speed 3130.03 samples/sec   Loss 12.9273   LearningRate 0.0730   Epoch: 2   Global Step: 36110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:37,427-Speed 3074.30 samples/sec   Loss 13.1129   LearningRate 0.0730   Epoch: 2   Global Step: 36120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:40,767-Speed 3067.40 samples/sec   Loss 12.9305   LearningRate 0.0730   Epoch: 2   Global Step: 36130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:44,100-Speed 3072.62 samples/sec   Loss 12.8100   LearningRate 0.0730   Epoch: 2   Global Step: 36140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:47,387-Speed 3116.72 samples/sec   Loss 12.9023   LearningRate 0.0730   Epoch: 2   Global Step: 36150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:50,660-Speed 3129.59 samples/sec   Loss 12.9999   LearningRate 0.0730   Epoch: 2   Global Step: 36160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:54,057-Speed 3015.24 samples/sec   Loss 12.8754   LearningRate 0.0730   Epoch: 2   Global Step: 36170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:07:57,425-Speed 3040.60 samples/sec   Loss 12.8482   LearningRate 0.0730   Epoch: 2   Global Step: 36180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:00,730-Speed 3100.19 samples/sec   Loss 12.9470   LearningRate 0.0730   Epoch: 2   Global Step: 36190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:04,096-Speed 3043.34 samples/sec   Loss 12.8553   LearningRate 0.0730   Epoch: 2   Global Step: 36200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:07,480-Speed 3026.70 samples/sec   Loss 12.8754   LearningRate 0.0730   Epoch: 2   Global Step: 36210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:10,781-Speed 3102.84 samples/sec   Loss 12.8222   LearningRate 0.0730   Epoch: 2   Global Step: 36220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:14,076-Speed 3108.67 samples/sec   Loss 13.0676   LearningRate 0.0730   Epoch: 2   Global Step: 36230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:17,366-Speed 3113.20 samples/sec   Loss 12.8813   LearningRate 0.0730   Epoch: 2   Global Step: 36240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:20,642-Speed 3126.14 samples/sec   Loss 12.9625   LearningRate 0.0729   Epoch: 2   Global Step: 36250   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 05:08:23,960-Speed 3087.40 samples/sec   Loss 12.9066   LearningRate 0.0729   Epoch: 2   Global Step: 36260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:27,292-Speed 3073.83 samples/sec   Loss 13.0432   LearningRate 0.0729   Epoch: 2   Global Step: 36270   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:30,590-Speed 3106.84 samples/sec   Loss 12.8477   LearningRate 0.0729   Epoch: 2   Global Step: 36280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:33,894-Speed 3100.20 samples/sec   Loss 12.9276   LearningRate 0.0729   Epoch: 2   Global Step: 36290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:37,170-Speed 3127.00 samples/sec   Loss 13.0060   LearningRate 0.0729   Epoch: 2   Global Step: 36300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:40,461-Speed 3112.41 samples/sec   Loss 12.8923   LearningRate 0.0729   Epoch: 2   Global Step: 36310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:43,815-Speed 3053.82 samples/sec   Loss 12.8831   LearningRate 0.0729   Epoch: 2   Global Step: 36320   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:47,191-Speed 3034.37 samples/sec   Loss 12.9038   LearningRate 0.0729   Epoch: 2   Global Step: 36330   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:50,585-Speed 3017.73 samples/sec   Loss 12.9913   LearningRate 0.0729   Epoch: 2   Global Step: 36340   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:53,890-Speed 3099.23 samples/sec   Loss 12.7886   LearningRate 0.0729   Epoch: 2   Global Step: 36350   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:08:57,189-Speed 3104.50 samples/sec   Loss 12.8896   LearningRate 0.0729   Epoch: 2   Global Step: 36360   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:00,525-Speed 3070.67 samples/sec   Loss 12.6755   LearningRate 0.0729   Epoch: 2   Global Step: 36370   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:03,882-Speed 3051.68 samples/sec   Loss 13.0536   LearningRate 0.0729   Epoch: 2   Global Step: 36380   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:07,221-Speed 3067.21 samples/sec   Loss 12.7660   LearningRate 0.0728   Epoch: 2   Global Step: 36390   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:10,533-Speed 3092.35 samples/sec   Loss 12.9223   LearningRate 0.0728   Epoch: 2   Global Step: 36400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:13,858-Speed 3081.44 samples/sec   Loss 13.0208   LearningRate 0.0728   Epoch: 2   Global Step: 36410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:17,170-Speed 3092.04 samples/sec   Loss 12.9870   LearningRate 0.0728   Epoch: 2   Global Step: 36420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:20,516-Speed 3061.21 samples/sec   Loss 13.0470   LearningRate 0.0728   Epoch: 2   Global Step: 36430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:23,809-Speed 3111.34 samples/sec   Loss 12.8917   LearningRate 0.0728   Epoch: 2   Global Step: 36440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:27,114-Speed 3099.11 samples/sec   Loss 12.9454   LearningRate 0.0728   Epoch: 2   Global Step: 36450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:30,476-Speed 3046.22 samples/sec   Loss 12.8939   LearningRate 0.0728   Epoch: 2   Global Step: 36460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:33,844-Speed 3042.17 samples/sec   Loss 12.9213   LearningRate 0.0728   Epoch: 2   Global Step: 36470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:37,217-Speed 3035.91 samples/sec   Loss 12.6987   LearningRate 0.0728   Epoch: 2   Global Step: 36480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:40,586-Speed 3040.46 samples/sec   Loss 12.7704   LearningRate 0.0728   Epoch: 2   Global Step: 36490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:43,958-Speed 3037.76 samples/sec   Loss 12.8202   LearningRate 0.0728   Epoch: 2   Global Step: 36500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:47,349-Speed 3020.70 samples/sec   Loss 12.9386   LearningRate 0.0728   Epoch: 2   Global Step: 36510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:50,696-Speed 3060.02 samples/sec   Loss 12.8461   LearningRate 0.0728   Epoch: 2   Global Step: 36520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:54,053-Speed 3051.82 samples/sec   Loss 12.9890   LearningRate 0.0728   Epoch: 2   Global Step: 36530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:09:57,405-Speed 3055.13 samples/sec   Loss 12.9805   LearningRate 0.0727   Epoch: 2   Global Step: 36540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:00,794-Speed 3022.33 samples/sec   Loss 12.7804   LearningRate 0.0727   Epoch: 2   Global Step: 36550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:04,172-Speed 3032.99 samples/sec   Loss 12.9932   LearningRate 0.0727   Epoch: 2   Global Step: 36560   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 05:10:07,595-Speed 2991.62 samples/sec   Loss 12.8991   LearningRate 0.0727   Epoch: 2   Global Step: 36570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:10,904-Speed 3096.02 samples/sec   Loss 12.7684   LearningRate 0.0727   Epoch: 2   Global Step: 36580   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:14,204-Speed 3104.45 samples/sec   Loss 12.9101   LearningRate 0.0727   Epoch: 2   Global Step: 36590   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:17,483-Speed 3123.17 samples/sec   Loss 12.9448   LearningRate 0.0727   Epoch: 2   Global Step: 36600   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:20,812-Speed 3077.10 samples/sec   Loss 12.9974   LearningRate 0.0727   Epoch: 2   Global Step: 36610   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:24,124-Speed 3093.07 samples/sec   Loss 13.0074   LearningRate 0.0727   Epoch: 2   Global Step: 36620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:27,531-Speed 3005.83 samples/sec   Loss 12.9626   LearningRate 0.0727   Epoch: 2   Global Step: 36630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:30,883-Speed 3056.45 samples/sec   Loss 12.9321   LearningRate 0.0727   Epoch: 2   Global Step: 36640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:34,263-Speed 3030.25 samples/sec   Loss 12.6570   LearningRate 0.0727   Epoch: 2   Global Step: 36650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:37,554-Speed 3112.68 samples/sec   Loss 12.9307   LearningRate 0.0727   Epoch: 2   Global Step: 36660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:40,894-Speed 3066.36 samples/sec   Loss 12.9936   LearningRate 0.0727   Epoch: 2   Global Step: 36670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:44,240-Speed 3062.23 samples/sec   Loss 12.9812   LearningRate 0.0726   Epoch: 2   Global Step: 36680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:47,577-Speed 3068.69 samples/sec   Loss 12.8401   LearningRate 0.0726   Epoch: 2   Global Step: 36690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:50,889-Speed 3093.00 samples/sec   Loss 12.9790   LearningRate 0.0726   Epoch: 2   Global Step: 36700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:54,207-Speed 3087.40 samples/sec   Loss 13.0494   LearningRate 0.0726   Epoch: 2   Global Step: 36710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:10:57,548-Speed 3065.82 samples/sec   Loss 13.0596   LearningRate 0.0726   Epoch: 2   Global Step: 36720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:00,879-Speed 3075.58 samples/sec   Loss 13.0030   LearningRate 0.0726   Epoch: 2   Global Step: 36730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:04,185-Speed 3098.10 samples/sec   Loss 12.8777   LearningRate 0.0726   Epoch: 2   Global Step: 36740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:07,488-Speed 3101.40 samples/sec   Loss 12.8691   LearningRate 0.0726   Epoch: 2   Global Step: 36750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:10,821-Speed 3072.82 samples/sec   Loss 12.8913   LearningRate 0.0726   Epoch: 2   Global Step: 36760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:14,266-Speed 2973.72 samples/sec   Loss 12.7072   LearningRate 0.0726   Epoch: 2   Global Step: 36770   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 05:11:17,608-Speed 3064.47 samples/sec   Loss 12.8755   LearningRate 0.0726   Epoch: 2   Global Step: 36780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:20,937-Speed 3077.50 samples/sec   Loss 12.9637   LearningRate 0.0726   Epoch: 2   Global Step: 36790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:24,291-Speed 3053.98 samples/sec   Loss 12.8786   LearningRate 0.0726   Epoch: 2   Global Step: 36800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:27,637-Speed 3061.64 samples/sec   Loss 12.8534   LearningRate 0.0726   Epoch: 2   Global Step: 36810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:30,941-Speed 3099.99 samples/sec   Loss 12.8979   LearningRate 0.0726   Epoch: 2   Global Step: 36820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:34,268-Speed 3078.16 samples/sec   Loss 12.9558   LearningRate 0.0725   Epoch: 2   Global Step: 36830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:37,580-Speed 3093.59 samples/sec   Loss 12.9081   LearningRate 0.0725   Epoch: 2   Global Step: 36840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:40,869-Speed 3113.38 samples/sec   Loss 12.9030   LearningRate 0.0725   Epoch: 2   Global Step: 36850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:44,191-Speed 3084.24 samples/sec   Loss 12.7949   LearningRate 0.0725   Epoch: 2   Global Step: 36860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:47,526-Speed 3071.28 samples/sec   Loss 12.8247   LearningRate 0.0725   Epoch: 2   Global Step: 36870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:50,823-Speed 3106.33 samples/sec   Loss 12.9763   LearningRate 0.0725   Epoch: 2   Global Step: 36880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:54,162-Speed 3068.34 samples/sec   Loss 12.9014   LearningRate 0.0725   Epoch: 2   Global Step: 36890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:11:57,445-Speed 3119.97 samples/sec   Loss 12.9474   LearningRate 0.0725   Epoch: 2   Global Step: 36900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:00,784-Speed 3067.44 samples/sec   Loss 13.1510   LearningRate 0.0725   Epoch: 2   Global Step: 36910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:04,143-Speed 3049.41 samples/sec   Loss 12.8148   LearningRate 0.0725   Epoch: 2   Global Step: 36920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:07,476-Speed 3072.95 samples/sec   Loss 12.9353   LearningRate 0.0725   Epoch: 2   Global Step: 36930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:10,822-Speed 3061.45 samples/sec   Loss 12.7645   LearningRate 0.0725   Epoch: 2   Global Step: 36940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:14,083-Speed 3141.26 samples/sec   Loss 12.8052   LearningRate 0.0725   Epoch: 2   Global Step: 36950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:17,413-Speed 3075.07 samples/sec   Loss 12.9229   LearningRate 0.0725   Epoch: 2   Global Step: 36960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:20,729-Speed 3089.54 samples/sec   Loss 12.8385   LearningRate 0.0725   Epoch: 2   Global Step: 36970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:24,089-Speed 3048.65 samples/sec   Loss 12.9046   LearningRate 0.0724   Epoch: 2   Global Step: 36980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:27,452-Speed 3044.99 samples/sec   Loss 12.7703   LearningRate 0.0724   Epoch: 2   Global Step: 36990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:30,790-Speed 3068.64 samples/sec   Loss 12.8828   LearningRate 0.0724   Epoch: 2   Global Step: 37000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:34,101-Speed 3093.42 samples/sec   Loss 12.8340   LearningRate 0.0724   Epoch: 2   Global Step: 37010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:37,421-Speed 3085.13 samples/sec   Loss 13.0160   LearningRate 0.0724   Epoch: 2   Global Step: 37020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:40,755-Speed 3072.47 samples/sec   Loss 13.0049   LearningRate 0.0724   Epoch: 2   Global Step: 37030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:44,152-Speed 3015.13 samples/sec   Loss 12.9755   LearningRate 0.0724   Epoch: 2   Global Step: 37040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:47,465-Speed 3091.49 samples/sec   Loss 12.9060   LearningRate 0.0724   Epoch: 2   Global Step: 37050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:50,839-Speed 3035.99 samples/sec   Loss 12.8630   LearningRate 0.0724   Epoch: 2   Global Step: 37060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:54,198-Speed 3049.77 samples/sec   Loss 12.8024   LearningRate 0.0724   Epoch: 2   Global Step: 37070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:12:57,509-Speed 3093.24 samples/sec   Loss 12.8654   LearningRate 0.0724   Epoch: 2   Global Step: 37080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:00,884-Speed 3035.63 samples/sec   Loss 12.9864   LearningRate 0.0724   Epoch: 2   Global Step: 37090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:04,241-Speed 3050.92 samples/sec   Loss 13.0573   LearningRate 0.0724   Epoch: 2   Global Step: 37100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:07,536-Speed 3108.83 samples/sec   Loss 12.9189   LearningRate 0.0724   Epoch: 2   Global Step: 37110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:10,934-Speed 3013.85 samples/sec   Loss 12.9223   LearningRate 0.0723   Epoch: 2   Global Step: 37120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:14,237-Speed 3101.79 samples/sec   Loss 12.8100   LearningRate 0.0723   Epoch: 2   Global Step: 37130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:17,575-Speed 3068.10 samples/sec   Loss 12.7477   LearningRate 0.0723   Epoch: 2   Global Step: 37140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:20,926-Speed 3057.12 samples/sec   Loss 12.7529   LearningRate 0.0723   Epoch: 2   Global Step: 37150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:24,259-Speed 3072.61 samples/sec   Loss 12.8531   LearningRate 0.0723   Epoch: 2   Global Step: 37160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:27,528-Speed 3133.84 samples/sec   Loss 12.9308   LearningRate 0.0723   Epoch: 2   Global Step: 37170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:30,919-Speed 3020.96 samples/sec   Loss 12.8848   LearningRate 0.0723   Epoch: 2   Global Step: 37180   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 05:13:34,233-Speed 3090.02 samples/sec   Loss 12.9662   LearningRate 0.0723   Epoch: 2   Global Step: 37190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:37,499-Speed 3136.67 samples/sec   Loss 12.9804   LearningRate 0.0723   Epoch: 2   Global Step: 37200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:40,905-Speed 3007.34 samples/sec   Loss 12.6714   LearningRate 0.0723   Epoch: 2   Global Step: 37210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:44,232-Speed 3078.00 samples/sec   Loss 12.6555   LearningRate 0.0723   Epoch: 2   Global Step: 37220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:47,595-Speed 3045.75 samples/sec   Loss 12.6770   LearningRate 0.0723   Epoch: 2   Global Step: 37230   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:50,939-Speed 3063.60 samples/sec   Loss 12.8288   LearningRate 0.0723   Epoch: 2   Global Step: 37240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:54,230-Speed 3111.88 samples/sec   Loss 13.0078   LearningRate 0.0723   Epoch: 2   Global Step: 37250   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:13:57,974-Speed 2735.66 samples/sec   Loss 12.9341   LearningRate 0.0723   Epoch: 2   Global Step: 37260   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:14:30,995-Speed 310.12 samples/sec   Loss 11.9205   LearningRate 0.0722   Epoch: 3   Global Step: 37270   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:14:34,426-Speed 2986.57 samples/sec   Loss 11.5031   LearningRate 0.0722   Epoch: 3   Global Step: 37280   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:14:37,940-Speed 2914.85 samples/sec   Loss 11.3419   LearningRate 0.0722   Epoch: 3   Global Step: 37290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:14:41,227-Speed 3116.31 samples/sec   Loss 11.2460   LearningRate 0.0722   Epoch: 3   Global Step: 37300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:14:44,555-Speed 3078.11 samples/sec   Loss 11.3438   LearningRate 0.0722   Epoch: 3   Global Step: 37310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:14:47,883-Speed 3078.09 samples/sec   Loss 11.4066   LearningRate 0.0722   Epoch: 3   Global Step: 37320   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:14:51,157-Speed 3128.50 samples/sec   Loss 11.4017   LearningRate 0.0722   Epoch: 3   Global Step: 37330   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:14:54,486-Speed 3077.45 samples/sec   Loss 11.4480   LearningRate 0.0722   Epoch: 3   Global Step: 37340   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:14:57,848-Speed 3046.24 samples/sec   Loss 11.3209   LearningRate 0.0722   Epoch: 3   Global Step: 37350   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:01,183-Speed 3072.58 samples/sec   Loss 11.3723   LearningRate 0.0722   Epoch: 3   Global Step: 37360   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:04,634-Speed 2967.91 samples/sec   Loss 11.3667   LearningRate 0.0722   Epoch: 3   Global Step: 37370   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:08,277-Speed 2811.32 samples/sec   Loss 11.3046   LearningRate 0.0722   Epoch: 3   Global Step: 37380   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:11,596-Speed 3086.62 samples/sec   Loss 11.4139   LearningRate 0.0722   Epoch: 3   Global Step: 37390   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:14,905-Speed 3094.99 samples/sec   Loss 11.5662   LearningRate 0.0722   Epoch: 3   Global Step: 37400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:18,241-Speed 3070.70 samples/sec   Loss 11.3513   LearningRate 0.0721   Epoch: 3   Global Step: 37410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:21,532-Speed 3112.65 samples/sec   Loss 11.4208   LearningRate 0.0721   Epoch: 3   Global Step: 37420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:24,880-Speed 3059.81 samples/sec   Loss 11.4459   LearningRate 0.0721   Epoch: 3   Global Step: 37430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:28,202-Speed 3083.59 samples/sec   Loss 11.4062   LearningRate 0.0721   Epoch: 3   Global Step: 37440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:31,641-Speed 2978.35 samples/sec   Loss 11.4481   LearningRate 0.0721   Epoch: 3   Global Step: 37450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:34,944-Speed 3100.77 samples/sec   Loss 11.5933   LearningRate 0.0721   Epoch: 3   Global Step: 37460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:38,235-Speed 3113.48 samples/sec   Loss 11.4029   LearningRate 0.0721   Epoch: 3   Global Step: 37470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:41,546-Speed 3093.61 samples/sec   Loss 11.5003   LearningRate 0.0721   Epoch: 3   Global Step: 37480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:44,954-Speed 3005.19 samples/sec   Loss 11.4417   LearningRate 0.0721   Epoch: 3   Global Step: 37490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:48,239-Speed 3118.73 samples/sec   Loss 11.5585   LearningRate 0.0721   Epoch: 3   Global Step: 37500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:51,507-Speed 3134.03 samples/sec   Loss 11.5626   LearningRate 0.0721   Epoch: 3   Global Step: 37510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:54,790-Speed 3120.40 samples/sec   Loss 11.5870   LearningRate 0.0721   Epoch: 3   Global Step: 37520   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:15:58,108-Speed 3087.04 samples/sec   Loss 11.5009   LearningRate 0.0721   Epoch: 3   Global Step: 37530   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:16:01,368-Speed 3142.42 samples/sec   Loss 11.4991   LearningRate 0.0721   Epoch: 3   Global Step: 37540   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:16:04,631-Speed 3139.13 samples/sec   Loss 11.4121   LearningRate 0.0721   Epoch: 3   Global Step: 37550   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:16:07,941-Speed 3094.73 samples/sec   Loss 11.5383   LearningRate 0.0720   Epoch: 3   Global Step: 37560   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:16:11,233-Speed 3112.25 samples/sec   Loss 11.4058   LearningRate 0.0720   Epoch: 3   Global Step: 37570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:16:14,542-Speed 3095.39 samples/sec   Loss 11.5314   LearningRate 0.0720   Epoch: 3   Global Step: 37580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:16:17,858-Speed 3088.59 samples/sec   Loss 11.5564   LearningRate 0.0720   Epoch: 3   Global Step: 37590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:16:21,210-Speed 3056.46 samples/sec   Loss 11.6092   LearningRate 0.0720   Epoch: 3   Global Step: 37600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:16:24,531-Speed 3084.05 samples/sec   Loss 11.5646   LearningRate 0.0720   Epoch: 3   Global Step: 37610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:16:27,861-Speed 3076.25 samples/sec   Loss 11.6669   LearningRate 0.0720   Epoch: 3   Global Step: 37620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:16:31,157-Speed 3108.07 samples/sec   Loss 11.6234   LearningRate 0.0720   Epoch: 3   Global Step: 37630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:16:34,416-Speed 3142.78 samples/sec   Loss 11.6427   LearningRate 0.0720   Epoch: 3   Global Step: 37640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:16:37,799-Speed 3028.15 samples/sec   Loss 11.6192   LearningRate 0.0720   Epoch: 3   Global Step: 37650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:16:41,068-Speed 3133.28 samples/sec   Loss 11.6650   LearningRate 0.0720   Epoch: 3   Global Step: 37660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:16:44,367-Speed 3104.49 samples/sec   Loss 11.6262   LearningRate 0.0720   Epoch: 3   Global Step: 37670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:16:47,681-Speed 3090.91 samples/sec   Loss 11.6907   LearningRate 0.0720   Epoch: 3   Global Step: 37680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:16:51,002-Speed 3085.47 samples/sec   Loss 11.5558   LearningRate 0.0720   Epoch: 3   Global Step: 37690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:16:54,350-Speed 3060.11 samples/sec   Loss 11.6180   LearningRate 0.0720   Epoch: 3   Global Step: 37700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:16:57,725-Speed 3034.75 samples/sec   Loss 11.7682   LearningRate 0.0719   Epoch: 3   Global Step: 37710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:17:01,011-Speed 3117.27 samples/sec   Loss 11.6428   LearningRate 0.0719   Epoch: 3   Global Step: 37720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:17:04,278-Speed 3135.36 samples/sec   Loss 11.8409   LearningRate 0.0719   Epoch: 3   Global Step: 37730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:17:07,560-Speed 3121.23 samples/sec   Loss 11.5750   LearningRate 0.0719   Epoch: 3   Global Step: 37740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:17:10,888-Speed 3077.13 samples/sec   Loss 11.7856   LearningRate 0.0719   Epoch: 3   Global Step: 37750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:17:14,195-Speed 3097.43 samples/sec   Loss 11.7447   LearningRate 0.0719   Epoch: 3   Global Step: 37760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:17:17,489-Speed 3109.94 samples/sec   Loss 11.9200   LearningRate 0.0719   Epoch: 3   Global Step: 37770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:17:20,783-Speed 3109.46 samples/sec   Loss 11.8256   LearningRate 0.0719   Epoch: 3   Global Step: 37780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:17:24,131-Speed 3059.49 samples/sec   Loss 11.7022   LearningRate 0.0719   Epoch: 3   Global Step: 37790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:17:27,485-Speed 3054.64 samples/sec   Loss 11.8034   LearningRate 0.0719   Epoch: 3   Global Step: 37800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:17:30,881-Speed 3016.27 samples/sec   Loss 11.8240   LearningRate 0.0719   Epoch: 3   Global Step: 37810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:17:34,196-Speed 3089.78 samples/sec   Loss 11.6209   LearningRate 0.0719   Epoch: 3   Global Step: 37820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:17:37,483-Speed 3116.01 samples/sec   Loss 11.7351   LearningRate 0.0719   Epoch: 3   Global Step: 37830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:17:40,799-Speed 3089.75 samples/sec   Loss 11.6422   LearningRate 0.0719   Epoch: 3   Global Step: 37840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:17:44,064-Speed 3136.50 samples/sec   Loss 11.7979   LearningRate 0.0718   Epoch: 3   Global Step: 37850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:17:47,400-Speed 3070.72 samples/sec   Loss 11.8036   LearningRate 0.0718   Epoch: 3   Global Step: 37860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:17:51,299-Speed 2626.78 samples/sec   Loss 11.8661   LearningRate 0.0718   Epoch: 3   Global Step: 37870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:17:54,640-Speed 3065.97 samples/sec   Loss 11.8632   LearningRate 0.0718   Epoch: 3   Global Step: 37880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:17:58,006-Speed 3043.47 samples/sec   Loss 11.6830   LearningRate 0.0718   Epoch: 3   Global Step: 37890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:18:01,307-Speed 3103.16 samples/sec   Loss 11.7900   LearningRate 0.0718   Epoch: 3   Global Step: 37900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:18:04,610-Speed 3101.41 samples/sec   Loss 11.8069   LearningRate 0.0718   Epoch: 3   Global Step: 37910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:18:07,934-Speed 3081.10 samples/sec   Loss 11.7834   LearningRate 0.0718   Epoch: 3   Global Step: 37920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:18:11,319-Speed 3026.25 samples/sec   Loss 12.0536   LearningRate 0.0718   Epoch: 3   Global Step: 37930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:18:14,658-Speed 3067.18 samples/sec   Loss 11.9290   LearningRate 0.0718   Epoch: 3   Global Step: 37940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:18:17,970-Speed 3092.93 samples/sec   Loss 11.9530   LearningRate 0.0718   Epoch: 3   Global Step: 37950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:18:21,289-Speed 3086.08 samples/sec   Loss 11.8556   LearningRate 0.0718   Epoch: 3   Global Step: 37960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:18:24,634-Speed 3062.00 samples/sec   Loss 11.9205   LearningRate 0.0718   Epoch: 3   Global Step: 37970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:18:27,985-Speed 3057.02 samples/sec   Loss 11.9414   LearningRate 0.0718   Epoch: 3   Global Step: 37980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:18:31,326-Speed 3065.61 samples/sec   Loss 11.7276   LearningRate 0.0718   Epoch: 3   Global Step: 37990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:18:34,680-Speed 3053.78 samples/sec   Loss 11.9858   LearningRate 0.0717   Epoch: 3   Global Step: 38000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:18:37,995-Speed 3089.92 samples/sec   Loss 11.8407   LearningRate 0.0717   Epoch: 3   Global Step: 38010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:18:41,289-Speed 3109.68 samples/sec   Loss 11.9894   LearningRate 0.0717   Epoch: 3   Global Step: 38020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:18:44,647-Speed 3050.75 samples/sec   Loss 11.9589   LearningRate 0.0717   Epoch: 3   Global Step: 38030   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 05:18:47,987-Speed 3066.58 samples/sec   Loss 11.9274   LearningRate 0.0717   Epoch: 3   Global Step: 38040   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 05:18:51,352-Speed 3043.97 samples/sec   Loss 11.9925   LearningRate 0.0717   Epoch: 3   Global Step: 38050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:18:54,665-Speed 3092.18 samples/sec   Loss 12.0199   LearningRate 0.0717   Epoch: 3   Global Step: 38060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:18:58,006-Speed 3065.79 samples/sec   Loss 11.7407   LearningRate 0.0717   Epoch: 3   Global Step: 38070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:19:01,372-Speed 3043.96 samples/sec   Loss 12.0449   LearningRate 0.0717   Epoch: 3   Global Step: 38080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:19:04,734-Speed 3046.79 samples/sec   Loss 11.8248   LearningRate 0.0717   Epoch: 3   Global Step: 38090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:19:08,059-Speed 3080.52 samples/sec   Loss 12.0219   LearningRate 0.0717   Epoch: 3   Global Step: 38100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:19:11,378-Speed 3086.52 samples/sec   Loss 12.0451   LearningRate 0.0717   Epoch: 3   Global Step: 38110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:19:14,719-Speed 3065.32 samples/sec   Loss 12.0044   LearningRate 0.0717   Epoch: 3   Global Step: 38120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:19:18,044-Speed 3080.60 samples/sec   Loss 11.8876   LearningRate 0.0717   Epoch: 3   Global Step: 38130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:19:21,471-Speed 2988.33 samples/sec   Loss 11.8823   LearningRate 0.0717   Epoch: 3   Global Step: 38140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:19:24,794-Speed 3082.45 samples/sec   Loss 12.1163   LearningRate 0.0716   Epoch: 3   Global Step: 38150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:19:28,136-Speed 3064.97 samples/sec   Loss 12.0786   LearningRate 0.0716   Epoch: 3   Global Step: 38160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:19:31,491-Speed 3053.61 samples/sec   Loss 11.9061   LearningRate 0.0716   Epoch: 3   Global Step: 38170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:19:34,905-Speed 3000.45 samples/sec   Loss 11.9836   LearningRate 0.0716   Epoch: 3   Global Step: 38180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:19:38,230-Speed 3080.53 samples/sec   Loss 12.0793   LearningRate 0.0716   Epoch: 3   Global Step: 38190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:19:41,566-Speed 3069.65 samples/sec   Loss 12.0546   LearningRate 0.0716   Epoch: 3   Global Step: 38200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:19:44,946-Speed 3031.01 samples/sec   Loss 12.1910   LearningRate 0.0716   Epoch: 3   Global Step: 38210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:19:48,241-Speed 3108.26 samples/sec   Loss 12.2086   LearningRate 0.0716   Epoch: 3   Global Step: 38220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:19:51,598-Speed 3052.15 samples/sec   Loss 11.9569   LearningRate 0.0716   Epoch: 3   Global Step: 38230   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:19:54,994-Speed 3015.46 samples/sec   Loss 11.9658   LearningRate 0.0716   Epoch: 3   Global Step: 38240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:19:58,346-Speed 3056.04 samples/sec   Loss 11.8958   LearningRate 0.0716   Epoch: 3   Global Step: 38250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:20:01,686-Speed 3067.19 samples/sec   Loss 12.0074   LearningRate 0.0716   Epoch: 3   Global Step: 38260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:20:05,052-Speed 3042.88 samples/sec   Loss 12.2445   LearningRate 0.0716   Epoch: 3   Global Step: 38270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:20:08,390-Speed 3069.04 samples/sec   Loss 12.1076   LearningRate 0.0716   Epoch: 3   Global Step: 38280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:20:11,701-Speed 3093.22 samples/sec   Loss 12.1438   LearningRate 0.0715   Epoch: 3   Global Step: 38290   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:20:15,061-Speed 3048.36 samples/sec   Loss 12.1913   LearningRate 0.0715   Epoch: 3   Global Step: 38300   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:20:18,358-Speed 3106.71 samples/sec   Loss 11.9882   LearningRate 0.0715   Epoch: 3   Global Step: 38310   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:20:21,671-Speed 3092.93 samples/sec   Loss 12.1582   LearningRate 0.0715   Epoch: 3   Global Step: 38320   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:20:25,016-Speed 3061.55 samples/sec   Loss 12.0691   LearningRate 0.0715   Epoch: 3   Global Step: 38330   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:20:28,391-Speed 3035.42 samples/sec   Loss 12.1104   LearningRate 0.0715   Epoch: 3   Global Step: 38340   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:20:31,780-Speed 3022.44 samples/sec   Loss 12.2594   LearningRate 0.0715   Epoch: 3   Global Step: 38350   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:20:35,069-Speed 3114.15 samples/sec   Loss 12.1816   LearningRate 0.0715   Epoch: 3   Global Step: 38360   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:20:38,352-Speed 3119.85 samples/sec   Loss 12.3508   LearningRate 0.0715   Epoch: 3   Global Step: 38370   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:20:41,637-Speed 3118.66 samples/sec   Loss 12.2759   LearningRate 0.0715   Epoch: 3   Global Step: 38380   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:20:44,943-Speed 3098.28 samples/sec   Loss 12.1389   LearningRate 0.0715   Epoch: 3   Global Step: 38390   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:20:48,363-Speed 2994.18 samples/sec   Loss 12.1294   LearningRate 0.0715   Epoch: 3   Global Step: 38400   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:20:51,686-Speed 3083.19 samples/sec   Loss 12.1652   LearningRate 0.0715   Epoch: 3   Global Step: 38410   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:20:54,991-Speed 3098.56 samples/sec   Loss 11.9764   LearningRate 0.0715   Epoch: 3   Global Step: 38420   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:20:58,342-Speed 3057.57 samples/sec   Loss 12.2184   LearningRate 0.0715   Epoch: 3   Global Step: 38430   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:21:01,692-Speed 3056.61 samples/sec   Loss 12.0630   LearningRate 0.0714   Epoch: 3   Global Step: 38440   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:21:05,019-Speed 3079.71 samples/sec   Loss 12.1810   LearningRate 0.0714   Epoch: 3   Global Step: 38450   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:21:08,368-Speed 3058.03 samples/sec   Loss 12.0914   LearningRate 0.0714   Epoch: 3   Global Step: 38460   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:21:11,668-Speed 3104.35 samples/sec   Loss 12.0928   LearningRate 0.0714   Epoch: 3   Global Step: 38470   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:21:15,021-Speed 3054.71 samples/sec   Loss 12.2230   LearningRate 0.0714   Epoch: 3   Global Step: 38480   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:21:18,315-Speed 3109.12 samples/sec   Loss 12.1104   LearningRate 0.0714   Epoch: 3   Global Step: 38490   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:21:21,680-Speed 3044.32 samples/sec   Loss 12.2006   LearningRate 0.0714   Epoch: 3   Global Step: 38500   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:21:25,009-Speed 3076.58 samples/sec   Loss 12.2275   LearningRate 0.0714   Epoch: 3   Global Step: 38510   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:21:28,303-Speed 3109.52 samples/sec   Loss 12.1747   LearningRate 0.0714   Epoch: 3   Global Step: 38520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:21:31,586-Speed 3120.20 samples/sec   Loss 12.2843   LearningRate 0.0714   Epoch: 3   Global Step: 38530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:21:34,920-Speed 3072.07 samples/sec   Loss 12.1800   LearningRate 0.0714   Epoch: 3   Global Step: 38540   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:21:38,287-Speed 3042.38 samples/sec   Loss 12.2954   LearningRate 0.0714   Epoch: 3   Global Step: 38550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:21:41,582-Speed 3108.35 samples/sec   Loss 12.1370   LearningRate 0.0714   Epoch: 3   Global Step: 38560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:21:44,937-Speed 3053.42 samples/sec   Loss 12.1298   LearningRate 0.0714   Epoch: 3   Global Step: 38570   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:21:48,319-Speed 3028.77 samples/sec   Loss 12.3969   LearningRate 0.0714   Epoch: 3   Global Step: 38580   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:21:51,679-Speed 3048.18 samples/sec   Loss 12.1990   LearningRate 0.0713   Epoch: 3   Global Step: 38590   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:21:54,996-Speed 3087.57 samples/sec   Loss 12.2661   LearningRate 0.0713   Epoch: 3   Global Step: 38600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:21:58,306-Speed 3095.26 samples/sec   Loss 12.0357   LearningRate 0.0713   Epoch: 3   Global Step: 38610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 05:22:01,642-Speed 3070.26 samples/sec   Loss 12.1960   LearningRate 0.0713   Epoch: 3   Global Step: 38620   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:04,998-Speed 3051.78 samples/sec   Loss 12.1963   LearningRate 0.0713   Epoch: 3   Global Step: 38630   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:08,370-Speed 3037.87 samples/sec   Loss 11.9942   LearningRate 0.0713   Epoch: 3   Global Step: 38640   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:11,723-Speed 3057.52 samples/sec   Loss 12.3534   LearningRate 0.0713   Epoch: 3   Global Step: 38650   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:15,173-Speed 2968.90 samples/sec   Loss 12.2677   LearningRate 0.0713   Epoch: 3   Global Step: 38660   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:18,493-Speed 3085.72 samples/sec   Loss 12.2452   LearningRate 0.0713   Epoch: 3   Global Step: 38670   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:21,798-Speed 3099.44 samples/sec   Loss 12.1382   LearningRate 0.0713   Epoch: 3   Global Step: 38680   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:25,188-Speed 3021.17 samples/sec   Loss 12.3438   LearningRate 0.0713   Epoch: 3   Global Step: 38690   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:28,458-Speed 3131.95 samples/sec   Loss 12.2088   LearningRate 0.0713   Epoch: 3   Global Step: 38700   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:31,725-Speed 3135.79 samples/sec   Loss 12.2470   LearningRate 0.0713   Epoch: 3   Global Step: 38710   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:35,007-Speed 3119.98 samples/sec   Loss 12.3032   LearningRate 0.0713   Epoch: 3   Global Step: 38720   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:38,331-Speed 3081.66 samples/sec   Loss 12.2575   LearningRate 0.0712   Epoch: 3   Global Step: 38730   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:41,697-Speed 3043.41 samples/sec   Loss 12.2594   LearningRate 0.0712   Epoch: 3   Global Step: 38740   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:44,983-Speed 3116.16 samples/sec   Loss 12.1834   LearningRate 0.0712   Epoch: 3   Global Step: 38750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:48,360-Speed 3034.29 samples/sec   Loss 12.1471   LearningRate 0.0712   Epoch: 3   Global Step: 38760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:51,631-Speed 3131.23 samples/sec   Loss 12.1082   LearningRate 0.0712   Epoch: 3   Global Step: 38770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:55,067-Speed 2980.20 samples/sec   Loss 12.3610   LearningRate 0.0712   Epoch: 3   Global Step: 38780   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:22:58,393-Speed 3079.69 samples/sec   Loss 12.1750   LearningRate 0.0712   Epoch: 3   Global Step: 38790   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:01,813-Speed 2995.16 samples/sec   Loss 12.1507   LearningRate 0.0712   Epoch: 3   Global Step: 38800   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:05,186-Speed 3037.77 samples/sec   Loss 12.3541   LearningRate 0.0712   Epoch: 3   Global Step: 38810   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:08,511-Speed 3079.84 samples/sec   Loss 12.1784   LearningRate 0.0712   Epoch: 3   Global Step: 38820   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:11,811-Speed 3104.73 samples/sec   Loss 12.3251   LearningRate 0.0712   Epoch: 3   Global Step: 38830   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:15,173-Speed 3046.32 samples/sec   Loss 12.3657   LearningRate 0.0712   Epoch: 3   Global Step: 38840   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:18,485-Speed 3092.75 samples/sec   Loss 12.3164   LearningRate 0.0712   Epoch: 3   Global Step: 38850   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:21,838-Speed 3054.24 samples/sec   Loss 12.1710   LearningRate 0.0712   Epoch: 3   Global Step: 38860   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:25,119-Speed 3121.71 samples/sec   Loss 12.2899   LearningRate 0.0712   Epoch: 3   Global Step: 38870   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:28,413-Speed 3110.33 samples/sec   Loss 12.3232   LearningRate 0.0711   Epoch: 3   Global Step: 38880   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:31,733-Speed 3085.40 samples/sec   Loss 12.0157   LearningRate 0.0711   Epoch: 3   Global Step: 38890   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:35,075-Speed 3064.47 samples/sec   Loss 12.3465   LearningRate 0.0711   Epoch: 3   Global Step: 38900   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:38,363-Speed 3115.60 samples/sec   Loss 12.2577   LearningRate 0.0711   Epoch: 3   Global Step: 38910   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:41,663-Speed 3104.40 samples/sec   Loss 12.2596   LearningRate 0.0711   Epoch: 3   Global Step: 38920   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:45,030-Speed 3042.69 samples/sec   Loss 12.3845   LearningRate 0.0711   Epoch: 3   Global Step: 38930   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:48,397-Speed 3041.96 samples/sec   Loss 12.3351   LearningRate 0.0711   Epoch: 3   Global Step: 38940   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:51,756-Speed 3049.52 samples/sec   Loss 12.3550   LearningRate 0.0711   Epoch: 3   Global Step: 38950   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:55,047-Speed 3111.83 samples/sec   Loss 12.3568   LearningRate 0.0711   Epoch: 3   Global Step: 38960   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:23:58,401-Speed 3054.39 samples/sec   Loss 12.3820   LearningRate 0.0711   Epoch: 3   Global Step: 38970   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:01,726-Speed 3080.92 samples/sec   Loss 12.2201   LearningRate 0.0711   Epoch: 3   Global Step: 38980   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:05,017-Speed 3112.03 samples/sec   Loss 12.2793   LearningRate 0.0711   Epoch: 3   Global Step: 38990   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:08,326-Speed 3095.43 samples/sec   Loss 12.3387   LearningRate 0.0711   Epoch: 3   Global Step: 39000   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:11,658-Speed 3075.21 samples/sec   Loss 12.3265   LearningRate 0.0711   Epoch: 3   Global Step: 39010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:14,972-Speed 3090.59 samples/sec   Loss 12.3376   LearningRate 0.0711   Epoch: 3   Global Step: 39020   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 05:24:18,260-Speed 3115.61 samples/sec   Loss 12.4751   LearningRate 0.0710   Epoch: 3   Global Step: 39030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:21,550-Speed 3113.06 samples/sec   Loss 12.4039   LearningRate 0.0710   Epoch: 3   Global Step: 39040   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:24,891-Speed 3065.71 samples/sec   Loss 12.4563   LearningRate 0.0710   Epoch: 3   Global Step: 39050   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:28,268-Speed 3032.94 samples/sec   Loss 12.3651   LearningRate 0.0710   Epoch: 3   Global Step: 39060   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:31,602-Speed 3073.52 samples/sec   Loss 12.4255   LearningRate 0.0710   Epoch: 3   Global Step: 39070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:34,919-Speed 3088.16 samples/sec   Loss 12.3663   LearningRate 0.0710   Epoch: 3   Global Step: 39080   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:38,338-Speed 2996.15 samples/sec   Loss 12.3953   LearningRate 0.0710   Epoch: 3   Global Step: 39090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:41,746-Speed 3005.02 samples/sec   Loss 12.3752   LearningRate 0.0710   Epoch: 3   Global Step: 39100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:45,126-Speed 3031.03 samples/sec   Loss 12.2951   LearningRate 0.0710   Epoch: 3   Global Step: 39110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:48,475-Speed 3058.25 samples/sec   Loss 12.3152   LearningRate 0.0710   Epoch: 3   Global Step: 39120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:51,731-Speed 3145.92 samples/sec   Loss 12.4722   LearningRate 0.0710   Epoch: 3   Global Step: 39130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:55,106-Speed 3034.80 samples/sec   Loss 12.4999   LearningRate 0.0710   Epoch: 3   Global Step: 39140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:24:58,430-Speed 3081.30 samples/sec   Loss 12.4214   LearningRate 0.0710   Epoch: 3   Global Step: 39150   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:25:01,813-Speed 3028.03 samples/sec   Loss 12.4076   LearningRate 0.0710   Epoch: 3   Global Step: 39160   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:25:05,120-Speed 3097.72 samples/sec   Loss 12.4416   LearningRate 0.0710   Epoch: 3   Global Step: 39170   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:25:08,491-Speed 3038.95 samples/sec   Loss 12.5211   LearningRate 0.0709   Epoch: 3   Global Step: 39180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:25:11,842-Speed 3056.88 samples/sec   Loss 12.3173   LearningRate 0.0709   Epoch: 3   Global Step: 39190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:25:15,150-Speed 3095.86 samples/sec   Loss 12.5403   LearningRate 0.0709   Epoch: 3   Global Step: 39200   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:25:18,428-Speed 3125.45 samples/sec   Loss 12.5795   LearningRate 0.0709   Epoch: 3   Global Step: 39210   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:25:21,766-Speed 3068.89 samples/sec   Loss 12.3762   LearningRate 0.0709   Epoch: 3   Global Step: 39220   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:25:25,065-Speed 3104.51 samples/sec   Loss 12.3915   LearningRate 0.0709   Epoch: 3   Global Step: 39230   Fp16 Grad Scale: 262144   Required: 20 hours
Training: 2022-04-27 05:25:28,351-Speed 3118.28 samples/sec   Loss 12.3515   LearningRate 0.0709   Epoch: 3   Global Step: 39240   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 05:25:31,625-Speed 3128.92 samples/sec   Loss 12.4495   LearningRate 0.0709   Epoch: 3   Global Step: 39250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:25:34,980-Speed 3054.13 samples/sec   Loss 12.2142   LearningRate 0.0709   Epoch: 3   Global Step: 39260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:25:38,274-Speed 3109.35 samples/sec   Loss 12.4218   LearningRate 0.0709   Epoch: 3   Global Step: 39270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:25:41,625-Speed 3056.39 samples/sec   Loss 12.4805   LearningRate 0.0709   Epoch: 3   Global Step: 39280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:25:44,964-Speed 3067.50 samples/sec   Loss 12.4209   LearningRate 0.0709   Epoch: 3   Global Step: 39290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:25:48,270-Speed 3099.06 samples/sec   Loss 12.5576   LearningRate 0.0709   Epoch: 3   Global Step: 39300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:25:51,648-Speed 3032.32 samples/sec   Loss 12.5759   LearningRate 0.0709   Epoch: 3   Global Step: 39310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:25:55,074-Speed 2990.16 samples/sec   Loss 12.3980   LearningRate 0.0708   Epoch: 3   Global Step: 39320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:25:58,464-Speed 3021.35 samples/sec   Loss 12.4466   LearningRate 0.0708   Epoch: 3   Global Step: 39330   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:26:01,799-Speed 3071.70 samples/sec   Loss 12.4160   LearningRate 0.0708   Epoch: 3   Global Step: 39340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:26:05,175-Speed 3033.61 samples/sec   Loss 12.2531   LearningRate 0.0708   Epoch: 3   Global Step: 39350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:26:08,549-Speed 3036.25 samples/sec   Loss 12.3855   LearningRate 0.0708   Epoch: 3   Global Step: 39360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:26:11,894-Speed 3062.27 samples/sec   Loss 12.4068   LearningRate 0.0708   Epoch: 3   Global Step: 39370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:26:15,174-Speed 3122.80 samples/sec   Loss 12.4296   LearningRate 0.0708   Epoch: 3   Global Step: 39380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:26:18,522-Speed 3059.89 samples/sec   Loss 12.2724   LearningRate 0.0708   Epoch: 3   Global Step: 39390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:26:21,936-Speed 3000.42 samples/sec   Loss 12.3088   LearningRate 0.0708   Epoch: 3   Global Step: 39400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:26:25,325-Speed 3021.64 samples/sec   Loss 12.4758   LearningRate 0.0708   Epoch: 3   Global Step: 39410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:26:28,693-Speed 3041.86 samples/sec   Loss 12.5726   LearningRate 0.0708   Epoch: 3   Global Step: 39420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:26:31,988-Speed 3108.23 samples/sec   Loss 12.4493   LearningRate 0.0708   Epoch: 3   Global Step: 39430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:26:35,373-Speed 3025.64 samples/sec   Loss 12.5835   LearningRate 0.0708   Epoch: 3   Global Step: 39440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:26:38,783-Speed 3003.87 samples/sec   Loss 12.4324   LearningRate 0.0708   Epoch: 3   Global Step: 39450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:26:42,097-Speed 3091.40 samples/sec   Loss 12.4517   LearningRate 0.0708   Epoch: 3   Global Step: 39460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:26:45,410-Speed 3091.62 samples/sec   Loss 12.4188   LearningRate 0.0707   Epoch: 3   Global Step: 39470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:26:48,761-Speed 3056.97 samples/sec   Loss 12.4432   LearningRate 0.0707   Epoch: 3   Global Step: 39480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:26:53,302-Speed 2255.93 samples/sec   Loss 12.5188   LearningRate 0.0707   Epoch: 3   Global Step: 39490   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:26:57,990-Speed 2184.69 samples/sec   Loss 12.4680   LearningRate 0.0707   Epoch: 3   Global Step: 39500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:27:01,320-Speed 3076.26 samples/sec   Loss 12.4680   LearningRate 0.0707   Epoch: 3   Global Step: 39510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:27:04,687-Speed 3042.48 samples/sec   Loss 12.3784   LearningRate 0.0707   Epoch: 3   Global Step: 39520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:27:08,038-Speed 3056.63 samples/sec   Loss 12.5120   LearningRate 0.0707   Epoch: 3   Global Step: 39530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:27:11,416-Speed 3032.28 samples/sec   Loss 12.5623   LearningRate 0.0707   Epoch: 3   Global Step: 39540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:27:14,800-Speed 3027.72 samples/sec   Loss 12.3467   LearningRate 0.0707   Epoch: 3   Global Step: 39550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:27:18,126-Speed 3079.27 samples/sec   Loss 12.5298   LearningRate 0.0707   Epoch: 3   Global Step: 39560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:27:21,473-Speed 3061.04 samples/sec   Loss 12.3783   LearningRate 0.0707   Epoch: 3   Global Step: 39570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:27:24,801-Speed 3077.33 samples/sec   Loss 12.5238   LearningRate 0.0707   Epoch: 3   Global Step: 39580   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:27:28,088-Speed 3117.70 samples/sec   Loss 12.4797   LearningRate 0.0707   Epoch: 3   Global Step: 39590   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:27:31,376-Speed 3115.05 samples/sec   Loss 12.3444   LearningRate 0.0707   Epoch: 3   Global Step: 39600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:27:34,726-Speed 3057.23 samples/sec   Loss 12.2828   LearningRate 0.0707   Epoch: 3   Global Step: 39610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:27:38,115-Speed 3022.30 samples/sec   Loss 12.3502   LearningRate 0.0706   Epoch: 3   Global Step: 39620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:27:41,481-Speed 3044.10 samples/sec   Loss 12.6049   LearningRate 0.0706   Epoch: 3   Global Step: 39630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:27:44,809-Speed 3077.58 samples/sec   Loss 12.5241   LearningRate 0.0706   Epoch: 3   Global Step: 39640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:27:48,122-Speed 3091.59 samples/sec   Loss 12.4046   LearningRate 0.0706   Epoch: 3   Global Step: 39650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:27:51,484-Speed 3046.73 samples/sec   Loss 12.3180   LearningRate 0.0706   Epoch: 3   Global Step: 39660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:27:54,782-Speed 3106.04 samples/sec   Loss 12.4850   LearningRate 0.0706   Epoch: 3   Global Step: 39670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:27:58,068-Speed 3117.31 samples/sec   Loss 12.3339   LearningRate 0.0706   Epoch: 3   Global Step: 39680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:28:01,475-Speed 3006.10 samples/sec   Loss 12.4977   LearningRate 0.0706   Epoch: 3   Global Step: 39690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:28:04,866-Speed 3021.17 samples/sec   Loss 12.5315   LearningRate 0.0706   Epoch: 3   Global Step: 39700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:28:08,219-Speed 3054.99 samples/sec   Loss 12.2665   LearningRate 0.0706   Epoch: 3   Global Step: 39710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:28:11,617-Speed 3015.36 samples/sec   Loss 12.4228   LearningRate 0.0706   Epoch: 3   Global Step: 39720   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:28:14,906-Speed 3113.96 samples/sec   Loss 12.4609   LearningRate 0.0706   Epoch: 3   Global Step: 39730   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:28:18,291-Speed 3026.38 samples/sec   Loss 12.4611   LearningRate 0.0706   Epoch: 3   Global Step: 39740   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:28:21,726-Speed 2981.85 samples/sec   Loss 12.2233   LearningRate 0.0706   Epoch: 3   Global Step: 39750   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:28:25,064-Speed 3068.70 samples/sec   Loss 12.4592   LearningRate 0.0706   Epoch: 3   Global Step: 39760   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:28:28,413-Speed 3058.28 samples/sec   Loss 12.3910   LearningRate 0.0705   Epoch: 3   Global Step: 39770   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:28:31,808-Speed 3017.36 samples/sec   Loss 12.4739   LearningRate 0.0705   Epoch: 3   Global Step: 39780   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:28:35,131-Speed 3082.60 samples/sec   Loss 12.4855   LearningRate 0.0705   Epoch: 3   Global Step: 39790   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:28:38,504-Speed 3036.56 samples/sec   Loss 12.5104   LearningRate 0.0705   Epoch: 3   Global Step: 39800   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 05:28:41,828-Speed 3081.48 samples/sec   Loss 12.3974   LearningRate 0.0705   Epoch: 3   Global Step: 39810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:28:45,134-Speed 3098.13 samples/sec   Loss 12.5955   LearningRate 0.0705   Epoch: 3   Global Step: 39820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:28:48,500-Speed 3043.30 samples/sec   Loss 12.4848   LearningRate 0.0705   Epoch: 3   Global Step: 39830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:28:51,835-Speed 3071.85 samples/sec   Loss 12.4704   LearningRate 0.0705   Epoch: 3   Global Step: 39840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:28:55,200-Speed 3043.37 samples/sec   Loss 12.5525   LearningRate 0.0705   Epoch: 3   Global Step: 39850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:28:58,556-Speed 3052.57 samples/sec   Loss 12.4954   LearningRate 0.0705   Epoch: 3   Global Step: 39860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:29:01,893-Speed 3069.86 samples/sec   Loss 12.4105   LearningRate 0.0705   Epoch: 3   Global Step: 39870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:29:05,166-Speed 3129.25 samples/sec   Loss 12.3791   LearningRate 0.0705   Epoch: 3   Global Step: 39880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:29:08,490-Speed 3081.07 samples/sec   Loss 12.4411   LearningRate 0.0705   Epoch: 3   Global Step: 39890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:29:11,786-Speed 3107.91 samples/sec   Loss 12.3842   LearningRate 0.0705   Epoch: 3   Global Step: 39900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:29:15,112-Speed 3080.06 samples/sec   Loss 12.4662   LearningRate 0.0704   Epoch: 3   Global Step: 39910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:29:18,428-Speed 3089.39 samples/sec   Loss 12.4879   LearningRate 0.0704   Epoch: 3   Global Step: 39920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:29:21,743-Speed 3089.34 samples/sec   Loss 12.4418   LearningRate 0.0704   Epoch: 3   Global Step: 39930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:29:25,081-Speed 3068.85 samples/sec   Loss 12.5027   LearningRate 0.0704   Epoch: 3   Global Step: 39940   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:29:28,447-Speed 3043.11 samples/sec   Loss 12.4592   LearningRate 0.0704   Epoch: 3   Global Step: 39950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:29:31,906-Speed 2961.08 samples/sec   Loss 12.5647   LearningRate 0.0704   Epoch: 3   Global Step: 39960   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:29:35,193-Speed 3116.65 samples/sec   Loss 12.6003   LearningRate 0.0704   Epoch: 3   Global Step: 39970   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:29:38,480-Speed 3116.22 samples/sec   Loss 12.4405   LearningRate 0.0704   Epoch: 3   Global Step: 39980   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:29:41,861-Speed 3029.15 samples/sec   Loss 12.5136   LearningRate 0.0704   Epoch: 3   Global Step: 39990   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:29:45,171-Speed 3094.78 samples/sec   Loss 12.5506   LearningRate 0.0704   Epoch: 3   Global Step: 40000   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:29:48,530-Speed 3048.92 samples/sec   Loss 12.6624   LearningRate 0.0704   Epoch: 3   Global Step: 40010   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:29:51,923-Speed 3019.90 samples/sec   Loss 12.4312   LearningRate 0.0704   Epoch: 3   Global Step: 40020   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:29:55,301-Speed 3032.19 samples/sec   Loss 12.6056   LearningRate 0.0704   Epoch: 3   Global Step: 40030   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:29:58,640-Speed 3067.61 samples/sec   Loss 12.5346   LearningRate 0.0704   Epoch: 3   Global Step: 40040   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:01,959-Speed 3085.88 samples/sec   Loss 12.5525   LearningRate 0.0704   Epoch: 3   Global Step: 40050   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:05,299-Speed 3066.49 samples/sec   Loss 12.4807   LearningRate 0.0703   Epoch: 3   Global Step: 40060   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:08,689-Speed 3021.75 samples/sec   Loss 12.6405   LearningRate 0.0703   Epoch: 3   Global Step: 40070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:12,006-Speed 3088.03 samples/sec   Loss 12.5056   LearningRate 0.0703   Epoch: 3   Global Step: 40080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:15,358-Speed 3056.07 samples/sec   Loss 12.5758   LearningRate 0.0703   Epoch: 3   Global Step: 40090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:18,658-Speed 3103.78 samples/sec   Loss 12.3732   LearningRate 0.0703   Epoch: 3   Global Step: 40100   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:21,988-Speed 3076.23 samples/sec   Loss 12.5930   LearningRate 0.0703   Epoch: 3   Global Step: 40110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:25,284-Speed 3107.59 samples/sec   Loss 12.4137   LearningRate 0.0703   Epoch: 3   Global Step: 40120   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 05:30:28,612-Speed 3077.88 samples/sec   Loss 12.5349   LearningRate 0.0703   Epoch: 3   Global Step: 40130   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:31,944-Speed 3074.71 samples/sec   Loss 12.6049   LearningRate 0.0703   Epoch: 3   Global Step: 40140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:35,316-Speed 3037.57 samples/sec   Loss 12.3937   LearningRate 0.0703   Epoch: 3   Global Step: 40150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:38,644-Speed 3077.47 samples/sec   Loss 12.3588   LearningRate 0.0703   Epoch: 3   Global Step: 40160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:41,973-Speed 3077.33 samples/sec   Loss 12.6301   LearningRate 0.0703   Epoch: 3   Global Step: 40170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:45,348-Speed 3034.47 samples/sec   Loss 12.6130   LearningRate 0.0703   Epoch: 3   Global Step: 40180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:48,687-Speed 3067.76 samples/sec   Loss 12.4910   LearningRate 0.0703   Epoch: 3   Global Step: 40190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:52,002-Speed 3090.48 samples/sec   Loss 12.6477   LearningRate 0.0703   Epoch: 3   Global Step: 40200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:55,362-Speed 3048.52 samples/sec   Loss 12.5845   LearningRate 0.0702   Epoch: 3   Global Step: 40210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:30:58,659-Speed 3107.42 samples/sec   Loss 12.4460   LearningRate 0.0702   Epoch: 3   Global Step: 40220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:02,046-Speed 3023.80 samples/sec   Loss 12.5609   LearningRate 0.0702   Epoch: 3   Global Step: 40230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:05,383-Speed 3069.24 samples/sec   Loss 12.5193   LearningRate 0.0702   Epoch: 3   Global Step: 40240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:08,672-Speed 3114.11 samples/sec   Loss 12.4122   LearningRate 0.0702   Epoch: 3   Global Step: 40250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:11,961-Speed 3115.35 samples/sec   Loss 12.4636   LearningRate 0.0702   Epoch: 3   Global Step: 40260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:15,325-Speed 3044.10 samples/sec   Loss 12.5726   LearningRate 0.0702   Epoch: 3   Global Step: 40270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:18,611-Speed 3118.02 samples/sec   Loss 12.5618   LearningRate 0.0702   Epoch: 3   Global Step: 40280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:21,915-Speed 3099.60 samples/sec   Loss 12.7461   LearningRate 0.0702   Epoch: 3   Global Step: 40290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:25,248-Speed 3073.89 samples/sec   Loss 12.5610   LearningRate 0.0702   Epoch: 3   Global Step: 40300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:28,562-Speed 3090.47 samples/sec   Loss 12.4990   LearningRate 0.0702   Epoch: 3   Global Step: 40310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:31,861-Speed 3104.98 samples/sec   Loss 12.5312   LearningRate 0.0702   Epoch: 3   Global Step: 40320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:35,143-Speed 3121.02 samples/sec   Loss 12.3843   LearningRate 0.0702   Epoch: 3   Global Step: 40330   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 05:31:38,460-Speed 3088.01 samples/sec   Loss 12.6623   LearningRate 0.0702   Epoch: 3   Global Step: 40340   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:41,789-Speed 3076.52 samples/sec   Loss 12.5734   LearningRate 0.0702   Epoch: 3   Global Step: 40350   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:45,100-Speed 3093.58 samples/sec   Loss 12.5473   LearningRate 0.0701   Epoch: 3   Global Step: 40360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:48,408-Speed 3096.14 samples/sec   Loss 12.5449   LearningRate 0.0701   Epoch: 3   Global Step: 40370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:51,746-Speed 3069.07 samples/sec   Loss 12.6228   LearningRate 0.0701   Epoch: 3   Global Step: 40380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:55,101-Speed 3052.66 samples/sec   Loss 12.5582   LearningRate 0.0701   Epoch: 3   Global Step: 40390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:31:58,482-Speed 3029.43 samples/sec   Loss 12.4855   LearningRate 0.0701   Epoch: 3   Global Step: 40400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:01,864-Speed 3029.42 samples/sec   Loss 12.5993   LearningRate 0.0701   Epoch: 3   Global Step: 40410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:05,210-Speed 3061.08 samples/sec   Loss 12.7215   LearningRate 0.0701   Epoch: 3   Global Step: 40420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:08,547-Speed 3069.39 samples/sec   Loss 12.5495   LearningRate 0.0701   Epoch: 3   Global Step: 40430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:11,920-Speed 3037.26 samples/sec   Loss 12.4941   LearningRate 0.0701   Epoch: 3   Global Step: 40440   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 05:32:15,194-Speed 3128.10 samples/sec   Loss 12.5878   LearningRate 0.0701   Epoch: 3   Global Step: 40450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:18,509-Speed 3089.96 samples/sec   Loss 12.4186   LearningRate 0.0701   Epoch: 3   Global Step: 40460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:21,836-Speed 3078.19 samples/sec   Loss 12.5219   LearningRate 0.0701   Epoch: 3   Global Step: 40470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:25,161-Speed 3081.39 samples/sec   Loss 12.5490   LearningRate 0.0701   Epoch: 3   Global Step: 40480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:28,532-Speed 3038.50 samples/sec   Loss 12.5635   LearningRate 0.0701   Epoch: 3   Global Step: 40490   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:31,903-Speed 3038.43 samples/sec   Loss 12.3832   LearningRate 0.0701   Epoch: 3   Global Step: 40500   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:35,249-Speed 3061.48 samples/sec   Loss 12.6454   LearningRate 0.0700   Epoch: 3   Global Step: 40510   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:38,591-Speed 3064.51 samples/sec   Loss 12.5892   LearningRate 0.0700   Epoch: 3   Global Step: 40520   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:41,875-Speed 3118.70 samples/sec   Loss 12.3716   LearningRate 0.0700   Epoch: 3   Global Step: 40530   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:45,257-Speed 3029.12 samples/sec   Loss 12.7095   LearningRate 0.0700   Epoch: 3   Global Step: 40540   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:48,505-Speed 3153.35 samples/sec   Loss 12.4472   LearningRate 0.0700   Epoch: 3   Global Step: 40550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:51,808-Speed 3101.49 samples/sec   Loss 12.6154   LearningRate 0.0700   Epoch: 3   Global Step: 40560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:55,110-Speed 3102.22 samples/sec   Loss 12.5448   LearningRate 0.0700   Epoch: 3   Global Step: 40570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:32:58,430-Speed 3085.23 samples/sec   Loss 12.4088   LearningRate 0.0700   Epoch: 3   Global Step: 40580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:01,750-Speed 3085.04 samples/sec   Loss 12.3161   LearningRate 0.0700   Epoch: 3   Global Step: 40590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:05,084-Speed 3071.63 samples/sec   Loss 12.6370   LearningRate 0.0700   Epoch: 3   Global Step: 40600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:08,413-Speed 3076.44 samples/sec   Loss 12.6830   LearningRate 0.0700   Epoch: 3   Global Step: 40610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:11,707-Speed 3110.20 samples/sec   Loss 12.7448   LearningRate 0.0700   Epoch: 3   Global Step: 40620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:15,062-Speed 3053.23 samples/sec   Loss 12.4139   LearningRate 0.0700   Epoch: 3   Global Step: 40630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:18,386-Speed 3081.37 samples/sec   Loss 12.4854   LearningRate 0.0700   Epoch: 3   Global Step: 40640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:21,726-Speed 3066.52 samples/sec   Loss 12.5631   LearningRate 0.0700   Epoch: 3   Global Step: 40650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:25,031-Speed 3099.32 samples/sec   Loss 12.7184   LearningRate 0.0699   Epoch: 3   Global Step: 40660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:28,349-Speed 3087.52 samples/sec   Loss 12.4256   LearningRate 0.0699   Epoch: 3   Global Step: 40670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:31,627-Speed 3125.29 samples/sec   Loss 12.4796   LearningRate 0.0699   Epoch: 3   Global Step: 40680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:34,933-Speed 3097.32 samples/sec   Loss 12.5224   LearningRate 0.0699   Epoch: 3   Global Step: 40690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:38,238-Speed 3099.52 samples/sec   Loss 12.5332   LearningRate 0.0699   Epoch: 3   Global Step: 40700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:41,584-Speed 3061.59 samples/sec   Loss 12.5824   LearningRate 0.0699   Epoch: 3   Global Step: 40710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:44,961-Speed 3033.88 samples/sec   Loss 12.4650   LearningRate 0.0699   Epoch: 3   Global Step: 40720   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:48,362-Speed 3011.52 samples/sec   Loss 12.4888   LearningRate 0.0699   Epoch: 3   Global Step: 40730   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:51,676-Speed 3091.12 samples/sec   Loss 12.5063   LearningRate 0.0699   Epoch: 3   Global Step: 40740   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:33:54,987-Speed 3093.96 samples/sec   Loss 12.5463   LearningRate 0.0699   Epoch: 3   Global Step: 40750   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 05:33:58,275-Speed 3115.47 samples/sec   Loss 12.5510   LearningRate 0.0699   Epoch: 3   Global Step: 40760   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:01,601-Speed 3079.17 samples/sec   Loss 12.5595   LearningRate 0.0699   Epoch: 3   Global Step: 40770   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:04,967-Speed 3044.04 samples/sec   Loss 12.4887   LearningRate 0.0699   Epoch: 3   Global Step: 40780   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:08,288-Speed 3084.10 samples/sec   Loss 12.4942   LearningRate 0.0699   Epoch: 3   Global Step: 40790   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:11,640-Speed 3056.09 samples/sec   Loss 12.4054   LearningRate 0.0698   Epoch: 3   Global Step: 40800   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:15,014-Speed 3036.16 samples/sec   Loss 12.4532   LearningRate 0.0698   Epoch: 3   Global Step: 40810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:18,337-Speed 3081.78 samples/sec   Loss 12.6587   LearningRate 0.0698   Epoch: 3   Global Step: 40820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:21,708-Speed 3038.56 samples/sec   Loss 12.4574   LearningRate 0.0698   Epoch: 3   Global Step: 40830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:25,022-Speed 3090.84 samples/sec   Loss 12.5278   LearningRate 0.0698   Epoch: 3   Global Step: 40840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:28,350-Speed 3077.53 samples/sec   Loss 12.4959   LearningRate 0.0698   Epoch: 3   Global Step: 40850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:31,611-Speed 3142.91 samples/sec   Loss 12.5502   LearningRate 0.0698   Epoch: 3   Global Step: 40860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:34,936-Speed 3081.03 samples/sec   Loss 12.8110   LearningRate 0.0698   Epoch: 3   Global Step: 40870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:38,327-Speed 3021.07 samples/sec   Loss 12.6021   LearningRate 0.0698   Epoch: 3   Global Step: 40880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:41,678-Speed 3056.82 samples/sec   Loss 12.5015   LearningRate 0.0698   Epoch: 3   Global Step: 40890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:45,020-Speed 3064.50 samples/sec   Loss 12.5337   LearningRate 0.0698   Epoch: 3   Global Step: 40900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:48,402-Speed 3028.91 samples/sec   Loss 12.5619   LearningRate 0.0698   Epoch: 3   Global Step: 40910   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:51,774-Speed 3037.44 samples/sec   Loss 12.6577   LearningRate 0.0698   Epoch: 3   Global Step: 40920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:55,051-Speed 3126.60 samples/sec   Loss 12.5650   LearningRate 0.0698   Epoch: 3   Global Step: 40930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:34:58,348-Speed 3106.21 samples/sec   Loss 12.4253   LearningRate 0.0698   Epoch: 3   Global Step: 40940   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:01,683-Speed 3071.29 samples/sec   Loss 12.3574   LearningRate 0.0697   Epoch: 3   Global Step: 40950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:04,994-Speed 3093.78 samples/sec   Loss 12.6223   LearningRate 0.0697   Epoch: 3   Global Step: 40960   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 05:35:08,332-Speed 3068.94 samples/sec   Loss 12.4038   LearningRate 0.0697   Epoch: 3   Global Step: 40970   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:11,676-Speed 3062.62 samples/sec   Loss 12.4616   LearningRate 0.0697   Epoch: 3   Global Step: 40980   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:15,054-Speed 3031.96 samples/sec   Loss 12.5311   LearningRate 0.0697   Epoch: 3   Global Step: 40990   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:18,395-Speed 3066.02 samples/sec   Loss 12.5175   LearningRate 0.0697   Epoch: 3   Global Step: 41000   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:21,785-Speed 3021.66 samples/sec   Loss 12.5024   LearningRate 0.0697   Epoch: 3   Global Step: 41010   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:25,161-Speed 3034.60 samples/sec   Loss 12.4092   LearningRate 0.0697   Epoch: 3   Global Step: 41020   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:28,478-Speed 3087.79 samples/sec   Loss 12.5286   LearningRate 0.0697   Epoch: 3   Global Step: 41030   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:31,777-Speed 3104.56 samples/sec   Loss 12.3709   LearningRate 0.0697   Epoch: 3   Global Step: 41040   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:35,095-Speed 3087.26 samples/sec   Loss 12.5980   LearningRate 0.0697   Epoch: 3   Global Step: 41050   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:38,397-Speed 3102.83 samples/sec   Loss 12.4571   LearningRate 0.0697   Epoch: 3   Global Step: 41060   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:41,724-Speed 3078.28 samples/sec   Loss 12.6438   LearningRate 0.0697   Epoch: 3   Global Step: 41070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:45,076-Speed 3055.43 samples/sec   Loss 12.5423   LearningRate 0.0697   Epoch: 3   Global Step: 41080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:48,407-Speed 3074.80 samples/sec   Loss 12.4374   LearningRate 0.0697   Epoch: 3   Global Step: 41090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:51,745-Speed 3069.13 samples/sec   Loss 12.4508   LearningRate 0.0696   Epoch: 3   Global Step: 41100   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:55,094-Speed 3058.18 samples/sec   Loss 12.5442   LearningRate 0.0696   Epoch: 3   Global Step: 41110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:35:58,476-Speed 3028.88 samples/sec   Loss 12.5390   LearningRate 0.0696   Epoch: 3   Global Step: 41120   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:01,797-Speed 3084.42 samples/sec   Loss 12.5305   LearningRate 0.0696   Epoch: 3   Global Step: 41130   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:05,111-Speed 3090.10 samples/sec   Loss 12.5159   LearningRate 0.0696   Epoch: 3   Global Step: 41140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:08,516-Speed 3009.34 samples/sec   Loss 12.4193   LearningRate 0.0696   Epoch: 3   Global Step: 41150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:11,879-Speed 3044.81 samples/sec   Loss 12.5820   LearningRate 0.0696   Epoch: 3   Global Step: 41160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:15,231-Speed 3055.81 samples/sec   Loss 12.5696   LearningRate 0.0696   Epoch: 3   Global Step: 41170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:18,585-Speed 3054.20 samples/sec   Loss 12.4668   LearningRate 0.0696   Epoch: 3   Global Step: 41180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:21,909-Speed 3082.16 samples/sec   Loss 12.6295   LearningRate 0.0696   Epoch: 3   Global Step: 41190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:25,185-Speed 3125.94 samples/sec   Loss 12.4917   LearningRate 0.0696   Epoch: 3   Global Step: 41200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:28,490-Speed 3099.33 samples/sec   Loss 12.4442   LearningRate 0.0696   Epoch: 3   Global Step: 41210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:31,861-Speed 3038.64 samples/sec   Loss 12.5308   LearningRate 0.0696   Epoch: 3   Global Step: 41220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:35,238-Speed 3032.67 samples/sec   Loss 12.5062   LearningRate 0.0696   Epoch: 3   Global Step: 41230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:38,592-Speed 3054.48 samples/sec   Loss 12.5859   LearningRate 0.0696   Epoch: 3   Global Step: 41240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:41,915-Speed 3082.32 samples/sec   Loss 12.6239   LearningRate 0.0695   Epoch: 3   Global Step: 41250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:45,273-Speed 3050.05 samples/sec   Loss 12.5059   LearningRate 0.0695   Epoch: 3   Global Step: 41260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:48,597-Speed 3081.63 samples/sec   Loss 12.6391   LearningRate 0.0695   Epoch: 3   Global Step: 41270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:36:51,906-Speed 3096.26 samples/sec   Loss 12.5890   LearningRate 0.0695   Epoch: 3   Global Step: 41280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:36:55,275-Speed 3039.87 samples/sec   Loss 12.4169   LearningRate 0.0695   Epoch: 3   Global Step: 41290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:36:58,543-Speed 3134.54 samples/sec   Loss 12.5215   LearningRate 0.0695   Epoch: 3   Global Step: 41300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:37:01,840-Speed 3107.18 samples/sec   Loss 12.3687   LearningRate 0.0695   Epoch: 3   Global Step: 41310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:37:05,190-Speed 3057.01 samples/sec   Loss 12.3843   LearningRate 0.0695   Epoch: 3   Global Step: 41320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:37:08,533-Speed 3064.71 samples/sec   Loss 12.5494   LearningRate 0.0695   Epoch: 3   Global Step: 41330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:37:11,838-Speed 3098.71 samples/sec   Loss 12.3671   LearningRate 0.0695   Epoch: 3   Global Step: 41340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:37:15,139-Speed 3102.76 samples/sec   Loss 12.4928   LearningRate 0.0695   Epoch: 3   Global Step: 41350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:37:18,451-Speed 3093.04 samples/sec   Loss 12.5024   LearningRate 0.0695   Epoch: 3   Global Step: 41360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:37:21,752-Speed 3103.11 samples/sec   Loss 12.5313   LearningRate 0.0695   Epoch: 3   Global Step: 41370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:37:25,076-Speed 3081.58 samples/sec   Loss 12.5466   LearningRate 0.0695   Epoch: 3   Global Step: 41380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:37:28,407-Speed 3075.09 samples/sec   Loss 12.4450   LearningRate 0.0695   Epoch: 3   Global Step: 41390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:37:31,795-Speed 3022.91 samples/sec   Loss 12.5913   LearningRate 0.0694   Epoch: 3   Global Step: 41400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:37:35,189-Speed 3018.56 samples/sec   Loss 12.2726   LearningRate 0.0694   Epoch: 3   Global Step: 41410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:37:38,544-Speed 3053.51 samples/sec   Loss 12.5175   LearningRate 0.0694   Epoch: 3   Global Step: 41420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:37:41,871-Speed 3077.88 samples/sec   Loss 12.6453   LearningRate 0.0694   Epoch: 3   Global Step: 41430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:37:45,214-Speed 3063.70 samples/sec   Loss 12.4236   LearningRate 0.0694   Epoch: 3   Global Step: 41440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:37:48,569-Speed 3053.29 samples/sec   Loss 12.5604   LearningRate 0.0694   Epoch: 3   Global Step: 41450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:37:51,899-Speed 3076.41 samples/sec   Loss 12.4828   LearningRate 0.0694   Epoch: 3   Global Step: 41460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:37:55,323-Speed 2990.81 samples/sec   Loss 12.4478   LearningRate 0.0694   Epoch: 3   Global Step: 41470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:37:58,652-Speed 3077.25 samples/sec   Loss 12.6363   LearningRate 0.0694   Epoch: 3   Global Step: 41480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:01,994-Speed 3064.66 samples/sec   Loss 12.4834   LearningRate 0.0694   Epoch: 3   Global Step: 41490   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:05,356-Speed 3047.02 samples/sec   Loss 12.4164   LearningRate 0.0694   Epoch: 3   Global Step: 41500   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:08,708-Speed 3056.60 samples/sec   Loss 12.5189   LearningRate 0.0694   Epoch: 3   Global Step: 41510   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:12,024-Speed 3088.69 samples/sec   Loss 12.4876   LearningRate 0.0694   Epoch: 3   Global Step: 41520   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:15,401-Speed 3033.57 samples/sec   Loss 12.6728   LearningRate 0.0694   Epoch: 3   Global Step: 41530   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:18,709-Speed 3096.47 samples/sec   Loss 12.4777   LearningRate 0.0694   Epoch: 3   Global Step: 41540   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:22,021-Speed 3092.84 samples/sec   Loss 12.5734   LearningRate 0.0693   Epoch: 3   Global Step: 41550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:25,401-Speed 3030.16 samples/sec   Loss 12.4624   LearningRate 0.0693   Epoch: 3   Global Step: 41560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:28,747-Speed 3061.47 samples/sec   Loss 12.6429   LearningRate 0.0693   Epoch: 3   Global Step: 41570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:32,097-Speed 3057.38 samples/sec   Loss 12.5839   LearningRate 0.0693   Epoch: 3   Global Step: 41580   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 05:38:35,369-Speed 3130.02 samples/sec   Loss 12.6168   LearningRate 0.0693   Epoch: 3   Global Step: 41590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:38,643-Speed 3128.95 samples/sec   Loss 12.3803   LearningRate 0.0693   Epoch: 3   Global Step: 41600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:41,938-Speed 3108.36 samples/sec   Loss 12.5064   LearningRate 0.0693   Epoch: 3   Global Step: 41610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:45,239-Speed 3103.56 samples/sec   Loss 12.4928   LearningRate 0.0693   Epoch: 3   Global Step: 41620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:48,553-Speed 3090.55 samples/sec   Loss 12.5363   LearningRate 0.0693   Epoch: 3   Global Step: 41630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:51,852-Speed 3104.77 samples/sec   Loss 12.3706   LearningRate 0.0693   Epoch: 3   Global Step: 41640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:55,181-Speed 3077.68 samples/sec   Loss 12.6176   LearningRate 0.0693   Epoch: 3   Global Step: 41650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:38:58,501-Speed 3085.21 samples/sec   Loss 12.5219   LearningRate 0.0693   Epoch: 3   Global Step: 41660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:01,820-Speed 3086.33 samples/sec   Loss 12.4497   LearningRate 0.0693   Epoch: 3   Global Step: 41670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:05,193-Speed 3036.26 samples/sec   Loss 12.4516   LearningRate 0.0693   Epoch: 3   Global Step: 41680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:08,460-Speed 3135.42 samples/sec   Loss 12.5876   LearningRate 0.0693   Epoch: 3   Global Step: 41690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:11,779-Speed 3086.50 samples/sec   Loss 12.5585   LearningRate 0.0692   Epoch: 3   Global Step: 41700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:15,062-Speed 3119.10 samples/sec   Loss 12.6022   LearningRate 0.0692   Epoch: 3   Global Step: 41710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:18,387-Speed 3081.25 samples/sec   Loss 12.5900   LearningRate 0.0692   Epoch: 3   Global Step: 41720   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:21,708-Speed 3084.19 samples/sec   Loss 12.3372   LearningRate 0.0692   Epoch: 3   Global Step: 41730   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:25,094-Speed 3024.75 samples/sec   Loss 12.5940   LearningRate 0.0692   Epoch: 3   Global Step: 41740   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:28,438-Speed 3062.92 samples/sec   Loss 12.4101   LearningRate 0.0692   Epoch: 3   Global Step: 41750   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:31,785-Speed 3060.49 samples/sec   Loss 12.4045   LearningRate 0.0692   Epoch: 3   Global Step: 41760   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:35,279-Speed 2931.67 samples/sec   Loss 12.5294   LearningRate 0.0692   Epoch: 3   Global Step: 41770   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:38,617-Speed 3068.97 samples/sec   Loss 12.5552   LearningRate 0.0692   Epoch: 3   Global Step: 41780   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:41,981-Speed 3044.86 samples/sec   Loss 12.5207   LearningRate 0.0692   Epoch: 3   Global Step: 41790   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:45,307-Speed 3079.48 samples/sec   Loss 12.4532   LearningRate 0.0692   Epoch: 3   Global Step: 41800   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:48,691-Speed 3027.41 samples/sec   Loss 12.5153   LearningRate 0.0692   Epoch: 3   Global Step: 41810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:52,064-Speed 3036.44 samples/sec   Loss 12.4807   LearningRate 0.0692   Epoch: 3   Global Step: 41820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:55,394-Speed 3076.79 samples/sec   Loss 12.5050   LearningRate 0.0692   Epoch: 3   Global Step: 41830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:39:58,728-Speed 3072.26 samples/sec   Loss 12.5754   LearningRate 0.0692   Epoch: 3   Global Step: 41840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:02,063-Speed 3071.16 samples/sec   Loss 12.2875   LearningRate 0.0691   Epoch: 3   Global Step: 41850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:05,354-Speed 3112.18 samples/sec   Loss 12.5046   LearningRate 0.0691   Epoch: 3   Global Step: 41860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:08,634-Speed 3123.68 samples/sec   Loss 12.3587   LearningRate 0.0691   Epoch: 3   Global Step: 41870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:11,953-Speed 3085.57 samples/sec   Loss 12.4314   LearningRate 0.0691   Epoch: 3   Global Step: 41880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:15,290-Speed 3070.03 samples/sec   Loss 12.3890   LearningRate 0.0691   Epoch: 3   Global Step: 41890   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 05:40:18,622-Speed 3073.91 samples/sec   Loss 12.5360   LearningRate 0.0691   Epoch: 3   Global Step: 41900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:21,895-Speed 3129.92 samples/sec   Loss 12.5962   LearningRate 0.0691   Epoch: 3   Global Step: 41910   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:25,254-Speed 3049.86 samples/sec   Loss 12.3088   LearningRate 0.0691   Epoch: 3   Global Step: 41920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:28,579-Speed 3079.87 samples/sec   Loss 12.4544   LearningRate 0.0691   Epoch: 3   Global Step: 41930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:31,873-Speed 3109.60 samples/sec   Loss 12.5174   LearningRate 0.0691   Epoch: 3   Global Step: 41940   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:35,201-Speed 3078.54 samples/sec   Loss 12.4370   LearningRate 0.0691   Epoch: 3   Global Step: 41950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:38,490-Speed 3114.17 samples/sec   Loss 12.4367   LearningRate 0.0691   Epoch: 3   Global Step: 41960   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:41,865-Speed 3034.56 samples/sec   Loss 12.3345   LearningRate 0.0691   Epoch: 3   Global Step: 41970   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:45,186-Speed 3084.47 samples/sec   Loss 12.3982   LearningRate 0.0691   Epoch: 3   Global Step: 41980   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:48,487-Speed 3102.70 samples/sec   Loss 12.3924   LearningRate 0.0691   Epoch: 3   Global Step: 41990   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:51,767-Speed 3122.87 samples/sec   Loss 12.5216   LearningRate 0.0690   Epoch: 3   Global Step: 42000   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:55,096-Speed 3077.06 samples/sec   Loss 12.6008   LearningRate 0.0690   Epoch: 3   Global Step: 42010   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:40:58,424-Speed 3077.29 samples/sec   Loss 12.5984   LearningRate 0.0690   Epoch: 3   Global Step: 42020   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:01,746-Speed 3083.79 samples/sec   Loss 12.4978   LearningRate 0.0690   Epoch: 3   Global Step: 42030   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:05,064-Speed 3087.06 samples/sec   Loss 12.3646   LearningRate 0.0690   Epoch: 3   Global Step: 42040   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:08,391-Speed 3079.07 samples/sec   Loss 12.3181   LearningRate 0.0690   Epoch: 3   Global Step: 42050   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:11,696-Speed 3099.19 samples/sec   Loss 12.5454   LearningRate 0.0690   Epoch: 3   Global Step: 42060   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:15,080-Speed 3026.49 samples/sec   Loss 12.3721   LearningRate 0.0690   Epoch: 3   Global Step: 42070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:18,377-Speed 3107.13 samples/sec   Loss 12.5925   LearningRate 0.0690   Epoch: 3   Global Step: 42080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:21,695-Speed 3086.93 samples/sec   Loss 12.4077   LearningRate 0.0690   Epoch: 3   Global Step: 42090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:24,994-Speed 3105.03 samples/sec   Loss 12.4697   LearningRate 0.0690   Epoch: 3   Global Step: 42100   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 05:41:28,346-Speed 3055.69 samples/sec   Loss 12.3590   LearningRate 0.0690   Epoch: 3   Global Step: 42110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:31,651-Speed 3099.76 samples/sec   Loss 12.6028   LearningRate 0.0690   Epoch: 3   Global Step: 42120   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:34,973-Speed 3083.53 samples/sec   Loss 12.4327   LearningRate 0.0690   Epoch: 3   Global Step: 42130   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:38,343-Speed 3039.11 samples/sec   Loss 12.4501   LearningRate 0.0690   Epoch: 3   Global Step: 42140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:41,691-Speed 3059.80 samples/sec   Loss 12.3631   LearningRate 0.0689   Epoch: 3   Global Step: 42150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:44,981-Speed 3113.12 samples/sec   Loss 12.4425   LearningRate 0.0689   Epoch: 3   Global Step: 42160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:48,285-Speed 3100.14 samples/sec   Loss 12.4039   LearningRate 0.0689   Epoch: 3   Global Step: 42170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:51,599-Speed 3091.42 samples/sec   Loss 12.4345   LearningRate 0.0689   Epoch: 3   Global Step: 42180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:54,947-Speed 3059.02 samples/sec   Loss 12.6959   LearningRate 0.0689   Epoch: 3   Global Step: 42190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:41:58,288-Speed 3065.66 samples/sec   Loss 12.4036   LearningRate 0.0689   Epoch: 3   Global Step: 42200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:01,566-Speed 3124.92 samples/sec   Loss 12.5196   LearningRate 0.0689   Epoch: 3   Global Step: 42210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:04,890-Speed 3081.47 samples/sec   Loss 12.4163   LearningRate 0.0689   Epoch: 3   Global Step: 42220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:08,193-Speed 3101.52 samples/sec   Loss 12.3810   LearningRate 0.0689   Epoch: 3   Global Step: 42230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:11,592-Speed 3012.97 samples/sec   Loss 12.6546   LearningRate 0.0689   Epoch: 3   Global Step: 42240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:14,979-Speed 3024.06 samples/sec   Loss 12.7075   LearningRate 0.0689   Epoch: 3   Global Step: 42250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:18,295-Speed 3089.66 samples/sec   Loss 12.6464   LearningRate 0.0689   Epoch: 3   Global Step: 42260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:21,657-Speed 3046.31 samples/sec   Loss 12.5487   LearningRate 0.0689   Epoch: 3   Global Step: 42270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:25,006-Speed 3058.38 samples/sec   Loss 12.4510   LearningRate 0.0689   Epoch: 3   Global Step: 42280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:28,297-Speed 3112.28 samples/sec   Loss 12.4322   LearningRate 0.0689   Epoch: 3   Global Step: 42290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:31,622-Speed 3081.32 samples/sec   Loss 12.4643   LearningRate 0.0688   Epoch: 3   Global Step: 42300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:35,005-Speed 3027.74 samples/sec   Loss 12.5288   LearningRate 0.0688   Epoch: 3   Global Step: 42310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:38,346-Speed 3066.17 samples/sec   Loss 12.5297   LearningRate 0.0688   Epoch: 3   Global Step: 42320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:41,650-Speed 3100.31 samples/sec   Loss 12.3811   LearningRate 0.0688   Epoch: 3   Global Step: 42330   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:44,979-Speed 3076.97 samples/sec   Loss 12.4794   LearningRate 0.0688   Epoch: 3   Global Step: 42340   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:48,355-Speed 3033.95 samples/sec   Loss 12.5707   LearningRate 0.0688   Epoch: 3   Global Step: 42350   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:51,694-Speed 3066.99 samples/sec   Loss 12.3846   LearningRate 0.0688   Epoch: 3   Global Step: 42360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:54,971-Speed 3125.77 samples/sec   Loss 12.4327   LearningRate 0.0688   Epoch: 3   Global Step: 42370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:42:58,316-Speed 3062.44 samples/sec   Loss 12.5351   LearningRate 0.0688   Epoch: 3   Global Step: 42380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:43:01,594-Speed 3124.45 samples/sec   Loss 12.5911   LearningRate 0.0688   Epoch: 3   Global Step: 42390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:43:04,932-Speed 3069.09 samples/sec   Loss 12.5524   LearningRate 0.0688   Epoch: 3   Global Step: 42400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:43:08,314-Speed 3028.35 samples/sec   Loss 12.3494   LearningRate 0.0688   Epoch: 3   Global Step: 42410   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 05:43:11,683-Speed 3040.26 samples/sec   Loss 12.3815   LearningRate 0.0688   Epoch: 3   Global Step: 42420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:43:15,047-Speed 3045.24 samples/sec   Loss 12.4485   LearningRate 0.0688   Epoch: 3   Global Step: 42430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:43:18,441-Speed 3017.88 samples/sec   Loss 12.6118   LearningRate 0.0688   Epoch: 3   Global Step: 42440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:43:21,786-Speed 3062.25 samples/sec   Loss 12.4588   LearningRate 0.0687   Epoch: 3   Global Step: 42450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:43:25,168-Speed 3029.45 samples/sec   Loss 12.4744   LearningRate 0.0687   Epoch: 3   Global Step: 42460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:43:28,543-Speed 3034.62 samples/sec   Loss 12.3774   LearningRate 0.0687   Epoch: 3   Global Step: 42470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:43:31,866-Speed 3082.29 samples/sec   Loss 12.5490   LearningRate 0.0687   Epoch: 3   Global Step: 42480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:43:35,119-Speed 3148.92 samples/sec   Loss 12.6097   LearningRate 0.0687   Epoch: 3   Global Step: 42490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:43:38,493-Speed 3035.68 samples/sec   Loss 12.4364   LearningRate 0.0687   Epoch: 3   Global Step: 42500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:43:41,804-Speed 3094.50 samples/sec   Loss 12.5154   LearningRate 0.0687   Epoch: 3   Global Step: 42510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:43:45,150-Speed 3060.59 samples/sec   Loss 12.3613   LearningRate 0.0687   Epoch: 3   Global Step: 42520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:43:48,462-Speed 3093.18 samples/sec   Loss 12.4787   LearningRate 0.0687   Epoch: 3   Global Step: 42530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:43:51,795-Speed 3072.41 samples/sec   Loss 12.5066   LearningRate 0.0687   Epoch: 3   Global Step: 42540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:43:55,161-Speed 3043.48 samples/sec   Loss 12.5179   LearningRate 0.0687   Epoch: 3   Global Step: 42550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:43:58,501-Speed 3067.22 samples/sec   Loss 12.4877   LearningRate 0.0687   Epoch: 3   Global Step: 42560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:44:01,864-Speed 3045.31 samples/sec   Loss 12.3858   LearningRate 0.0687   Epoch: 3   Global Step: 42570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:44:05,215-Speed 3057.17 samples/sec   Loss 12.4607   LearningRate 0.0687   Epoch: 3   Global Step: 42580   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:44:08,530-Speed 3089.92 samples/sec   Loss 12.4162   LearningRate 0.0687   Epoch: 3   Global Step: 42590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:11,812-Speed 3120.88 samples/sec   Loss 12.4441   LearningRate 0.0686   Epoch: 3   Global Step: 42600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:15,199-Speed 3024.04 samples/sec   Loss 12.3814   LearningRate 0.0686   Epoch: 3   Global Step: 42610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:18,569-Speed 3039.37 samples/sec   Loss 12.4682   LearningRate 0.0686   Epoch: 3   Global Step: 42620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:21,922-Speed 3055.15 samples/sec   Loss 12.5228   LearningRate 0.0686   Epoch: 3   Global Step: 42630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:25,308-Speed 3024.63 samples/sec   Loss 12.2618   LearningRate 0.0686   Epoch: 3   Global Step: 42640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:28,684-Speed 3034.54 samples/sec   Loss 12.4633   LearningRate 0.0686   Epoch: 3   Global Step: 42650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:31,984-Speed 3104.45 samples/sec   Loss 12.4418   LearningRate 0.0686   Epoch: 3   Global Step: 42660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:35,269-Speed 3117.31 samples/sec   Loss 12.3706   LearningRate 0.0686   Epoch: 3   Global Step: 42670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:38,587-Speed 3087.85 samples/sec   Loss 12.4898   LearningRate 0.0686   Epoch: 3   Global Step: 42680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:41,991-Speed 3008.59 samples/sec   Loss 12.4300   LearningRate 0.0686   Epoch: 3   Global Step: 42690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:45,314-Speed 3082.07 samples/sec   Loss 12.4032   LearningRate 0.0686   Epoch: 3   Global Step: 42700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:48,649-Speed 3071.88 samples/sec   Loss 12.5091   LearningRate 0.0686   Epoch: 3   Global Step: 42710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:51,963-Speed 3090.72 samples/sec   Loss 12.4249   LearningRate 0.0686   Epoch: 3   Global Step: 42720   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:55,291-Speed 3077.74 samples/sec   Loss 12.3208   LearningRate 0.0686   Epoch: 3   Global Step: 42730   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:44:58,640-Speed 3060.73 samples/sec   Loss 12.4105   LearningRate 0.0686   Epoch: 3   Global Step: 42740   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:45:01,962-Speed 3083.45 samples/sec   Loss 12.4678   LearningRate 0.0685   Epoch: 3   Global Step: 42750   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:45:05,333-Speed 3038.59 samples/sec   Loss 12.3278   LearningRate 0.0685   Epoch: 3   Global Step: 42760   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:45:08,620-Speed 3116.23 samples/sec   Loss 12.5075   LearningRate 0.0685   Epoch: 3   Global Step: 42770   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:45:11,934-Speed 3090.15 samples/sec   Loss 12.4322   LearningRate 0.0685   Epoch: 3   Global Step: 42780   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:45:15,229-Speed 3109.70 samples/sec   Loss 12.3752   LearningRate 0.0685   Epoch: 3   Global Step: 42790   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 05:45:18,519-Speed 3113.05 samples/sec   Loss 12.3770   LearningRate 0.0685   Epoch: 3   Global Step: 42800   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:45:21,911-Speed 3019.74 samples/sec   Loss 12.4023   LearningRate 0.0685   Epoch: 3   Global Step: 42810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:45:25,271-Speed 3048.64 samples/sec   Loss 12.6107   LearningRate 0.0685   Epoch: 3   Global Step: 42820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:45:28,666-Speed 3017.81 samples/sec   Loss 12.4023   LearningRate 0.0685   Epoch: 3   Global Step: 42830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:45:32,039-Speed 3036.26 samples/sec   Loss 12.4476   LearningRate 0.0685   Epoch: 3   Global Step: 42840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:45:35,366-Speed 3078.91 samples/sec   Loss 12.5108   LearningRate 0.0685   Epoch: 3   Global Step: 42850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:45:38,746-Speed 3030.59 samples/sec   Loss 12.4251   LearningRate 0.0685   Epoch: 3   Global Step: 42860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:45:42,076-Speed 3076.97 samples/sec   Loss 12.3388   LearningRate 0.0685   Epoch: 3   Global Step: 42870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:45:45,440-Speed 3044.59 samples/sec   Loss 12.3593   LearningRate 0.0685   Epoch: 3   Global Step: 42880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:45:48,731-Speed 3112.15 samples/sec   Loss 12.5631   LearningRate 0.0685   Epoch: 3   Global Step: 42890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:45:52,045-Speed 3091.39 samples/sec   Loss 12.3737   LearningRate 0.0684   Epoch: 3   Global Step: 42900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:45:55,358-Speed 3091.20 samples/sec   Loss 12.2157   LearningRate 0.0684   Epoch: 3   Global Step: 42910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:45:58,686-Speed 3079.77 samples/sec   Loss 12.4303   LearningRate 0.0684   Epoch: 3   Global Step: 42920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:46:02,012-Speed 3080.22 samples/sec   Loss 12.3957   LearningRate 0.0684   Epoch: 3   Global Step: 42930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:46:05,341-Speed 3076.70 samples/sec   Loss 12.4276   LearningRate 0.0684   Epoch: 3   Global Step: 42940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:46:08,692-Speed 3057.14 samples/sec   Loss 12.5606   LearningRate 0.0684   Epoch: 3   Global Step: 42950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:46:12,063-Speed 3040.31 samples/sec   Loss 12.4345   LearningRate 0.0684   Epoch: 3   Global Step: 42960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:46:15,385-Speed 3083.40 samples/sec   Loss 12.3115   LearningRate 0.0684   Epoch: 3   Global Step: 42970   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:46:18,746-Speed 3047.35 samples/sec   Loss 12.4215   LearningRate 0.0684   Epoch: 3   Global Step: 42980   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:46:22,130-Speed 3026.95 samples/sec   Loss 12.3781   LearningRate 0.0684   Epoch: 3   Global Step: 42990   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:46:25,446-Speed 3088.79 samples/sec   Loss 12.3317   LearningRate 0.0684   Epoch: 3   Global Step: 43000   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:46:28,773-Speed 3079.02 samples/sec   Loss 12.5736   LearningRate 0.0684   Epoch: 3   Global Step: 43010   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:46:32,052-Speed 3123.81 samples/sec   Loss 12.4130   LearningRate 0.0684   Epoch: 3   Global Step: 43020   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:46:35,333-Speed 3122.17 samples/sec   Loss 12.4619   LearningRate 0.0684   Epoch: 3   Global Step: 43030   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:46:38,695-Speed 3046.47 samples/sec   Loss 12.5018   LearningRate 0.0684   Epoch: 3   Global Step: 43040   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:46:42,074-Speed 3031.31 samples/sec   Loss 12.4018   LearningRate 0.0683   Epoch: 3   Global Step: 43050   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:46:45,374-Speed 3104.54 samples/sec   Loss 12.4346   LearningRate 0.0683   Epoch: 3   Global Step: 43060   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:46:48,659-Speed 3118.10 samples/sec   Loss 12.2570   LearningRate 0.0683   Epoch: 3   Global Step: 43070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:46:52,020-Speed 3047.01 samples/sec   Loss 12.5681   LearningRate 0.0683   Epoch: 3   Global Step: 43080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:46:55,338-Speed 3087.37 samples/sec   Loss 12.4149   LearningRate 0.0683   Epoch: 3   Global Step: 43090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:46:58,648-Speed 3094.87 samples/sec   Loss 12.3969   LearningRate 0.0683   Epoch: 3   Global Step: 43100   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:47:01,958-Speed 3094.49 samples/sec   Loss 12.3858   LearningRate 0.0683   Epoch: 3   Global Step: 43110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:47:05,273-Speed 3090.43 samples/sec   Loss 12.4800   LearningRate 0.0683   Epoch: 3   Global Step: 43120   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:47:08,626-Speed 3054.00 samples/sec   Loss 12.3393   LearningRate 0.0683   Epoch: 3   Global Step: 43130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:47:11,943-Speed 3088.18 samples/sec   Loss 12.4256   LearningRate 0.0683   Epoch: 3   Global Step: 43140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:47:15,240-Speed 3107.45 samples/sec   Loss 12.5309   LearningRate 0.0683   Epoch: 3   Global Step: 43150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:47:18,572-Speed 3073.19 samples/sec   Loss 12.5212   LearningRate 0.0683   Epoch: 3   Global Step: 43160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:47:21,999-Speed 2989.41 samples/sec   Loss 12.4094   LearningRate 0.0683   Epoch: 3   Global Step: 43170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:47:25,306-Speed 3097.50 samples/sec   Loss 12.4150   LearningRate 0.0683   Epoch: 3   Global Step: 43180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:47:28,624-Speed 3087.18 samples/sec   Loss 12.2855   LearningRate 0.0683   Epoch: 3   Global Step: 43190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:47:31,939-Speed 3089.56 samples/sec   Loss 12.2974   LearningRate 0.0682   Epoch: 3   Global Step: 43200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:47:35,255-Speed 3089.10 samples/sec   Loss 12.2374   LearningRate 0.0682   Epoch: 3   Global Step: 43210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:47:38,600-Speed 3061.97 samples/sec   Loss 12.3599   LearningRate 0.0682   Epoch: 3   Global Step: 43220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:47:41,901-Speed 3103.39 samples/sec   Loss 12.4862   LearningRate 0.0682   Epoch: 3   Global Step: 43230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:47:45,185-Speed 3119.09 samples/sec   Loss 12.4483   LearningRate 0.0682   Epoch: 3   Global Step: 43240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:47:48,508-Speed 3083.00 samples/sec   Loss 12.4753   LearningRate 0.0682   Epoch: 3   Global Step: 43250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:47:51,852-Speed 3062.93 samples/sec   Loss 12.4780   LearningRate 0.0682   Epoch: 3   Global Step: 43260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:47:55,159-Speed 3097.81 samples/sec   Loss 12.4644   LearningRate 0.0682   Epoch: 3   Global Step: 43270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:47:58,473-Speed 3090.67 samples/sec   Loss 12.2041   LearningRate 0.0682   Epoch: 3   Global Step: 43280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:01,775-Speed 3102.25 samples/sec   Loss 12.3300   LearningRate 0.0682   Epoch: 3   Global Step: 43290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:05,090-Speed 3089.46 samples/sec   Loss 12.3596   LearningRate 0.0682   Epoch: 3   Global Step: 43300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:08,426-Speed 3071.26 samples/sec   Loss 12.4623   LearningRate 0.0682   Epoch: 3   Global Step: 43310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:11,699-Speed 3129.07 samples/sec   Loss 12.4280   LearningRate 0.0682   Epoch: 3   Global Step: 43320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:15,003-Speed 3100.37 samples/sec   Loss 12.3562   LearningRate 0.0682   Epoch: 3   Global Step: 43330   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:18,357-Speed 3053.50 samples/sec   Loss 12.3980   LearningRate 0.0682   Epoch: 3   Global Step: 43340   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:21,659-Speed 3101.97 samples/sec   Loss 12.4919   LearningRate 0.0681   Epoch: 3   Global Step: 43350   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:24,975-Speed 3089.41 samples/sec   Loss 12.4273   LearningRate 0.0681   Epoch: 3   Global Step: 43360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:28,342-Speed 3041.80 samples/sec   Loss 12.3533   LearningRate 0.0681   Epoch: 3   Global Step: 43370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:31,663-Speed 3085.27 samples/sec   Loss 12.3494   LearningRate 0.0681   Epoch: 3   Global Step: 43380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:34,989-Speed 3079.02 samples/sec   Loss 12.3811   LearningRate 0.0681   Epoch: 3   Global Step: 43390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:38,349-Speed 3049.32 samples/sec   Loss 12.4373   LearningRate 0.0681   Epoch: 3   Global Step: 43400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:41,750-Speed 3011.43 samples/sec   Loss 12.2182   LearningRate 0.0681   Epoch: 3   Global Step: 43410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:45,029-Speed 3123.88 samples/sec   Loss 12.1821   LearningRate 0.0681   Epoch: 3   Global Step: 43420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:48,323-Speed 3108.90 samples/sec   Loss 12.3016   LearningRate 0.0681   Epoch: 3   Global Step: 43430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:51,667-Speed 3063.62 samples/sec   Loss 12.5779   LearningRate 0.0681   Epoch: 3   Global Step: 43440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:54,983-Speed 3088.61 samples/sec   Loss 12.4481   LearningRate 0.0681   Epoch: 3   Global Step: 43450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:48:58,360-Speed 3033.21 samples/sec   Loss 12.2661   LearningRate 0.0681   Epoch: 3   Global Step: 43460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:49:01,718-Speed 3050.82 samples/sec   Loss 12.3220   LearningRate 0.0681   Epoch: 3   Global Step: 43470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:49:04,996-Speed 3124.70 samples/sec   Loss 12.2238   LearningRate 0.0681   Epoch: 3   Global Step: 43480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:49:08,306-Speed 3094.66 samples/sec   Loss 12.4699   LearningRate 0.0681   Epoch: 3   Global Step: 43490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:49:11,670-Speed 3044.60 samples/sec   Loss 12.3322   LearningRate 0.0680   Epoch: 3   Global Step: 43500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:49:14,981-Speed 3094.32 samples/sec   Loss 12.3565   LearningRate 0.0680   Epoch: 3   Global Step: 43510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:49:18,347-Speed 3043.02 samples/sec   Loss 12.4391   LearningRate 0.0680   Epoch: 3   Global Step: 43520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:49:21,730-Speed 3027.25 samples/sec   Loss 12.3698   LearningRate 0.0680   Epoch: 3   Global Step: 43530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:49:25,041-Speed 3093.94 samples/sec   Loss 12.4238   LearningRate 0.0680   Epoch: 3   Global Step: 43540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:49:28,381-Speed 3066.90 samples/sec   Loss 12.3941   LearningRate 0.0680   Epoch: 3   Global Step: 43550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:49:31,645-Speed 3137.73 samples/sec   Loss 12.2798   LearningRate 0.0680   Epoch: 3   Global Step: 43560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:49:34,993-Speed 3059.80 samples/sec   Loss 12.4989   LearningRate 0.0680   Epoch: 3   Global Step: 43570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:49:38,289-Speed 3107.92 samples/sec   Loss 12.4030   LearningRate 0.0680   Epoch: 3   Global Step: 43580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:49:41,634-Speed 3061.93 samples/sec   Loss 12.5533   LearningRate 0.0680   Epoch: 3   Global Step: 43590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:49:44,907-Speed 3130.07 samples/sec   Loss 12.4488   LearningRate 0.0680   Epoch: 3   Global Step: 43600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:49:48,256-Speed 3057.89 samples/sec   Loss 12.4151   LearningRate 0.0680   Epoch: 3   Global Step: 43610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:49:51,568-Speed 3092.73 samples/sec   Loss 12.4607   LearningRate 0.0680   Epoch: 3   Global Step: 43620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:49:55,024-Speed 2964.29 samples/sec   Loss 12.4766   LearningRate 0.0680   Epoch: 3   Global Step: 43630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:49:58,371-Speed 3059.95 samples/sec   Loss 12.3626   LearningRate 0.0680   Epoch: 3   Global Step: 43640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:50:01,691-Speed 3085.83 samples/sec   Loss 12.3715   LearningRate 0.0679   Epoch: 3   Global Step: 43650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:50:05,163-Speed 2950.13 samples/sec   Loss 12.3991   LearningRate 0.0679   Epoch: 3   Global Step: 43660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:50:08,508-Speed 3062.14 samples/sec   Loss 12.2719   LearningRate 0.0679   Epoch: 3   Global Step: 43670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:50:11,825-Speed 3087.71 samples/sec   Loss 12.2914   LearningRate 0.0679   Epoch: 3   Global Step: 43680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:50:15,217-Speed 3020.27 samples/sec   Loss 12.3624   LearningRate 0.0679   Epoch: 3   Global Step: 43690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:50:18,552-Speed 3070.86 samples/sec   Loss 12.3573   LearningRate 0.0679   Epoch: 3   Global Step: 43700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:50:21,940-Speed 3023.63 samples/sec   Loss 12.5662   LearningRate 0.0679   Epoch: 3   Global Step: 43710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:50:25,263-Speed 3082.80 samples/sec   Loss 12.4836   LearningRate 0.0679   Epoch: 3   Global Step: 43720   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:50:28,653-Speed 3021.78 samples/sec   Loss 12.4643   LearningRate 0.0679   Epoch: 3   Global Step: 43730   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:50:32,003-Speed 3057.39 samples/sec   Loss 12.3458   LearningRate 0.0679   Epoch: 3   Global Step: 43740   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 05:50:35,365-Speed 3047.35 samples/sec   Loss 12.4674   LearningRate 0.0679   Epoch: 3   Global Step: 43750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 05:50:38,677-Speed 3092.26 samples/sec   Loss 12.4189   LearningRate 0.0679   Epoch: 3   Global Step: 43760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 05:50:42,026-Speed 3058.71 samples/sec   Loss 12.3055   LearningRate 0.0679   Epoch: 3   Global Step: 43770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 05:50:45,386-Speed 3047.97 samples/sec   Loss 12.3029   LearningRate 0.0679   Epoch: 3   Global Step: 43780   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 05:50:48,820-Speed 2983.31 samples/sec   Loss 12.3614   LearningRate 0.0679   Epoch: 3   Global Step: 43790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 05:50:52,099-Speed 3123.63 samples/sec   Loss 12.2863   LearningRate 0.0678   Epoch: 3   Global Step: 43800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 05:50:55,466-Speed 3042.40 samples/sec   Loss 12.3672   LearningRate 0.0678   Epoch: 3   Global Step: 43810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 05:50:58,853-Speed 3023.88 samples/sec   Loss 12.3590   LearningRate 0.0678   Epoch: 3   Global Step: 43820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 05:51:02,225-Speed 3037.79 samples/sec   Loss 12.3355   LearningRate 0.0678   Epoch: 3   Global Step: 43830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 05:51:05,597-Speed 3037.82 samples/sec   Loss 12.2756   LearningRate 0.0678   Epoch: 3   Global Step: 43840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:51:08,926-Speed 3077.39 samples/sec   Loss 12.2757   LearningRate 0.0678   Epoch: 3   Global Step: 43850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:51:12,240-Speed 3090.28 samples/sec   Loss 12.3935   LearningRate 0.0678   Epoch: 3   Global Step: 43860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:51:15,578-Speed 3068.39 samples/sec   Loss 12.3286   LearningRate 0.0678   Epoch: 3   Global Step: 43870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:51:18,996-Speed 2997.41 samples/sec   Loss 12.3514   LearningRate 0.0678   Epoch: 3   Global Step: 43880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:51:22,351-Speed 3053.13 samples/sec   Loss 12.3303   LearningRate 0.0678   Epoch: 3   Global Step: 43890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:51:25,818-Speed 2954.04 samples/sec   Loss 12.3417   LearningRate 0.0678   Epoch: 3   Global Step: 43900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:51:29,186-Speed 3041.21 samples/sec   Loss 12.3248   LearningRate 0.0678   Epoch: 3   Global Step: 43910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:51:32,542-Speed 3052.44 samples/sec   Loss 12.3415   LearningRate 0.0678   Epoch: 3   Global Step: 43920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:51:35,946-Speed 3008.86 samples/sec   Loss 12.4423   LearningRate 0.0678   Epoch: 3   Global Step: 43930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:51:39,338-Speed 3019.58 samples/sec   Loss 12.2954   LearningRate 0.0678   Epoch: 3   Global Step: 43940   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:51:42,765-Speed 2989.73 samples/sec   Loss 12.3772   LearningRate 0.0677   Epoch: 3   Global Step: 43950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:51:46,130-Speed 3044.12 samples/sec   Loss 12.4305   LearningRate 0.0677   Epoch: 3   Global Step: 43960   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:51:49,491-Speed 3047.51 samples/sec   Loss 12.3943   LearningRate 0.0677   Epoch: 3   Global Step: 43970   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:51:52,790-Speed 3104.83 samples/sec   Loss 12.4189   LearningRate 0.0677   Epoch: 3   Global Step: 43980   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:51:56,065-Speed 3127.90 samples/sec   Loss 12.4352   LearningRate 0.0677   Epoch: 3   Global Step: 43990   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:51:59,425-Speed 3048.29 samples/sec   Loss 12.2694   LearningRate 0.0677   Epoch: 3   Global Step: 44000   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:02,744-Speed 3087.17 samples/sec   Loss 12.3484   LearningRate 0.0677   Epoch: 3   Global Step: 44010   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:06,095-Speed 3057.01 samples/sec   Loss 12.3098   LearningRate 0.0677   Epoch: 3   Global Step: 44020   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:09,460-Speed 3043.47 samples/sec   Loss 12.3853   LearningRate 0.0677   Epoch: 3   Global Step: 44030   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:12,747-Speed 3116.40 samples/sec   Loss 12.2673   LearningRate 0.0677   Epoch: 3   Global Step: 44040   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:16,021-Speed 3128.59 samples/sec   Loss 12.2009   LearningRate 0.0677   Epoch: 3   Global Step: 44050   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:19,356-Speed 3071.56 samples/sec   Loss 12.4004   LearningRate 0.0677   Epoch: 3   Global Step: 44060   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:22,712-Speed 3052.79 samples/sec   Loss 12.2869   LearningRate 0.0677   Epoch: 3   Global Step: 44070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:26,026-Speed 3090.90 samples/sec   Loss 12.3020   LearningRate 0.0677   Epoch: 3   Global Step: 44080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:29,336-Speed 3094.49 samples/sec   Loss 12.4433   LearningRate 0.0677   Epoch: 3   Global Step: 44090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:32,686-Speed 3057.83 samples/sec   Loss 12.4162   LearningRate 0.0676   Epoch: 3   Global Step: 44100   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:36,033-Speed 3060.03 samples/sec   Loss 12.2311   LearningRate 0.0676   Epoch: 3   Global Step: 44110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:39,444-Speed 3003.14 samples/sec   Loss 12.3081   LearningRate 0.0676   Epoch: 3   Global Step: 44120   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:42,750-Speed 3098.38 samples/sec   Loss 12.2695   LearningRate 0.0676   Epoch: 3   Global Step: 44130   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:46,055-Speed 3099.24 samples/sec   Loss 12.2862   LearningRate 0.0676   Epoch: 3   Global Step: 44140   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 05:52:49,327-Speed 3130.63 samples/sec   Loss 12.3190   LearningRate 0.0676   Epoch: 3   Global Step: 44150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:52,689-Speed 3046.92 samples/sec   Loss 12.3968   LearningRate 0.0676   Epoch: 3   Global Step: 44160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:56,023-Speed 3071.86 samples/sec   Loss 12.2713   LearningRate 0.0676   Epoch: 3   Global Step: 44170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:52:59,324-Speed 3103.30 samples/sec   Loss 12.4444   LearningRate 0.0676   Epoch: 3   Global Step: 44180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:02,660-Speed 3071.12 samples/sec   Loss 12.3069   LearningRate 0.0676   Epoch: 3   Global Step: 44190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:06,003-Speed 3063.95 samples/sec   Loss 12.5187   LearningRate 0.0676   Epoch: 3   Global Step: 44200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:09,321-Speed 3086.80 samples/sec   Loss 12.4271   LearningRate 0.0676   Epoch: 3   Global Step: 44210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:12,683-Speed 3046.52 samples/sec   Loss 12.2584   LearningRate 0.0676   Epoch: 3   Global Step: 44220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:16,028-Speed 3062.63 samples/sec   Loss 12.1407   LearningRate 0.0676   Epoch: 3   Global Step: 44230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:19,318-Speed 3113.47 samples/sec   Loss 12.4616   LearningRate 0.0676   Epoch: 3   Global Step: 44240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:22,638-Speed 3085.45 samples/sec   Loss 12.4835   LearningRate 0.0675   Epoch: 3   Global Step: 44250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:26,026-Speed 3022.61 samples/sec   Loss 12.4200   LearningRate 0.0675   Epoch: 3   Global Step: 44260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:29,387-Speed 3048.43 samples/sec   Loss 12.3048   LearningRate 0.0675   Epoch: 3   Global Step: 44270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:32,641-Speed 3147.81 samples/sec   Loss 12.2918   LearningRate 0.0675   Epoch: 3   Global Step: 44280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:35,952-Speed 3093.23 samples/sec   Loss 12.1901   LearningRate 0.0675   Epoch: 3   Global Step: 44290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:39,261-Speed 3096.24 samples/sec   Loss 12.5132   LearningRate 0.0675   Epoch: 3   Global Step: 44300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:42,628-Speed 3041.48 samples/sec   Loss 12.3937   LearningRate 0.0675   Epoch: 3   Global Step: 44310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:46,012-Speed 3027.32 samples/sec   Loss 12.3020   LearningRate 0.0675   Epoch: 3   Global Step: 44320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:49,428-Speed 2998.49 samples/sec   Loss 12.2844   LearningRate 0.0675   Epoch: 3   Global Step: 44330   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:52,732-Speed 3100.33 samples/sec   Loss 12.3277   LearningRate 0.0675   Epoch: 3   Global Step: 44340   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:53:56,043-Speed 3093.10 samples/sec   Loss 12.3226   LearningRate 0.0675   Epoch: 3   Global Step: 44350   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 05:53:59,314-Speed 3131.42 samples/sec   Loss 12.3949   LearningRate 0.0675   Epoch: 3   Global Step: 44360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:02,632-Speed 3087.23 samples/sec   Loss 12.3457   LearningRate 0.0675   Epoch: 3   Global Step: 44370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:05,900-Speed 3134.31 samples/sec   Loss 12.3097   LearningRate 0.0675   Epoch: 3   Global Step: 44380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:09,197-Speed 3107.51 samples/sec   Loss 12.3155   LearningRate 0.0675   Epoch: 3   Global Step: 44390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:12,517-Speed 3084.27 samples/sec   Loss 12.3125   LearningRate 0.0674   Epoch: 3   Global Step: 44400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:15,957-Speed 2978.18 samples/sec   Loss 12.4556   LearningRate 0.0674   Epoch: 3   Global Step: 44410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:19,313-Speed 3051.82 samples/sec   Loss 12.2535   LearningRate 0.0674   Epoch: 3   Global Step: 44420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:22,682-Speed 3040.74 samples/sec   Loss 12.4553   LearningRate 0.0674   Epoch: 3   Global Step: 44430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:26,007-Speed 3080.55 samples/sec   Loss 12.2148   LearningRate 0.0674   Epoch: 3   Global Step: 44440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:29,311-Speed 3099.70 samples/sec   Loss 12.2004   LearningRate 0.0674   Epoch: 3   Global Step: 44450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:32,638-Speed 3079.12 samples/sec   Loss 12.3082   LearningRate 0.0674   Epoch: 3   Global Step: 44460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:36,006-Speed 3041.21 samples/sec   Loss 12.3827   LearningRate 0.0674   Epoch: 3   Global Step: 44470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:39,307-Speed 3103.48 samples/sec   Loss 12.2846   LearningRate 0.0674   Epoch: 3   Global Step: 44480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:42,608-Speed 3102.27 samples/sec   Loss 12.2859   LearningRate 0.0674   Epoch: 3   Global Step: 44490   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:45,967-Speed 3049.52 samples/sec   Loss 12.3547   LearningRate 0.0674   Epoch: 3   Global Step: 44500   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:49,384-Speed 2997.89 samples/sec   Loss 12.2469   LearningRate 0.0674   Epoch: 3   Global Step: 44510   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:52,701-Speed 3087.56 samples/sec   Loss 12.3763   LearningRate 0.0674   Epoch: 3   Global Step: 44520   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:54:56,002-Speed 3103.47 samples/sec   Loss 12.3060   LearningRate 0.0674   Epoch: 3   Global Step: 44530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:54:59,290-Speed 3114.84 samples/sec   Loss 12.1564   LearningRate 0.0674   Epoch: 3   Global Step: 44540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:55:02,597-Speed 3097.19 samples/sec   Loss 12.2491   LearningRate 0.0673   Epoch: 3   Global Step: 44550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:55:06,013-Speed 2999.02 samples/sec   Loss 12.4150   LearningRate 0.0673   Epoch: 3   Global Step: 44560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:55:09,364-Speed 3056.61 samples/sec   Loss 12.4344   LearningRate 0.0673   Epoch: 3   Global Step: 44570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:55:12,733-Speed 3040.34 samples/sec   Loss 12.2977   LearningRate 0.0673   Epoch: 3   Global Step: 44580   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:55:16,075-Speed 3065.14 samples/sec   Loss 12.4036   LearningRate 0.0673   Epoch: 3   Global Step: 44590   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:55:19,401-Speed 3079.13 samples/sec   Loss 12.3158   LearningRate 0.0673   Epoch: 3   Global Step: 44600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:55:22,735-Speed 3072.75 samples/sec   Loss 12.1688   LearningRate 0.0673   Epoch: 3   Global Step: 44610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:55:26,073-Speed 3068.35 samples/sec   Loss 12.4049   LearningRate 0.0673   Epoch: 3   Global Step: 44620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:55:29,383-Speed 3094.19 samples/sec   Loss 12.2311   LearningRate 0.0673   Epoch: 3   Global Step: 44630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:55:32,776-Speed 3018.82 samples/sec   Loss 12.3083   LearningRate 0.0673   Epoch: 3   Global Step: 44640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:55:36,103-Speed 3078.72 samples/sec   Loss 12.3306   LearningRate 0.0673   Epoch: 3   Global Step: 44650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:55:39,446-Speed 3064.27 samples/sec   Loss 12.3022   LearningRate 0.0673   Epoch: 3   Global Step: 44660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:55:42,802-Speed 3052.11 samples/sec   Loss 12.4138   LearningRate 0.0673   Epoch: 3   Global Step: 44670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:55:46,133-Speed 3074.78 samples/sec   Loss 12.2831   LearningRate 0.0673   Epoch: 3   Global Step: 44680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:55:49,412-Speed 3124.33 samples/sec   Loss 12.3206   LearningRate 0.0673   Epoch: 3   Global Step: 44690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:55:52,714-Speed 3101.92 samples/sec   Loss 12.4565   LearningRate 0.0673   Epoch: 3   Global Step: 44700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:55:56,040-Speed 3080.22 samples/sec   Loss 12.2082   LearningRate 0.0672   Epoch: 3   Global Step: 44710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:55:59,371-Speed 3074.52 samples/sec   Loss 12.2257   LearningRate 0.0672   Epoch: 3   Global Step: 44720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:56:02,721-Speed 3057.41 samples/sec   Loss 12.2184   LearningRate 0.0672   Epoch: 3   Global Step: 44730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:56:05,997-Speed 3126.98 samples/sec   Loss 12.2583   LearningRate 0.0672   Epoch: 3   Global Step: 44740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:56:09,359-Speed 3046.85 samples/sec   Loss 12.4327   LearningRate 0.0672   Epoch: 3   Global Step: 44750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:56:12,720-Speed 3047.51 samples/sec   Loss 12.4521   LearningRate 0.0672   Epoch: 3   Global Step: 44760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:56:16,064-Speed 3063.03 samples/sec   Loss 12.3670   LearningRate 0.0672   Epoch: 3   Global Step: 44770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:56:19,412-Speed 3059.88 samples/sec   Loss 12.3522   LearningRate 0.0672   Epoch: 3   Global Step: 44780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:56:22,689-Speed 3124.97 samples/sec   Loss 12.1413   LearningRate 0.0672   Epoch: 3   Global Step: 44790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:56:26,026-Speed 3069.80 samples/sec   Loss 12.3570   LearningRate 0.0672   Epoch: 3   Global Step: 44800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:56:29,326-Speed 3104.50 samples/sec   Loss 12.3182   LearningRate 0.0672   Epoch: 3   Global Step: 44810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:56:32,626-Speed 3104.19 samples/sec   Loss 12.3513   LearningRate 0.0672   Epoch: 3   Global Step: 44820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:56:35,932-Speed 3098.64 samples/sec   Loss 12.2420   LearningRate 0.0672   Epoch: 3   Global Step: 44830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:56:39,269-Speed 3069.20 samples/sec   Loss 12.3663   LearningRate 0.0672   Epoch: 3   Global Step: 44840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:56:42,579-Speed 3094.91 samples/sec   Loss 12.2617   LearningRate 0.0672   Epoch: 3   Global Step: 44850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:56:45,905-Speed 3079.72 samples/sec   Loss 12.2341   LearningRate 0.0671   Epoch: 3   Global Step: 44860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:56:49,232-Speed 3077.79 samples/sec   Loss 12.3828   LearningRate 0.0671   Epoch: 3   Global Step: 44870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:56:52,592-Speed 3048.86 samples/sec   Loss 12.3646   LearningRate 0.0671   Epoch: 3   Global Step: 44880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:56:55,874-Speed 3120.69 samples/sec   Loss 12.3610   LearningRate 0.0671   Epoch: 3   Global Step: 44890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:56:59,205-Speed 3075.29 samples/sec   Loss 12.3220   LearningRate 0.0671   Epoch: 3   Global Step: 44900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:02,466-Speed 3141.24 samples/sec   Loss 12.2149   LearningRate 0.0671   Epoch: 3   Global Step: 44910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:05,829-Speed 3045.85 samples/sec   Loss 12.2787   LearningRate 0.0671   Epoch: 3   Global Step: 44920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:09,105-Speed 3126.35 samples/sec   Loss 12.2565   LearningRate 0.0671   Epoch: 3   Global Step: 44930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:12,486-Speed 3029.61 samples/sec   Loss 12.2046   LearningRate 0.0671   Epoch: 3   Global Step: 44940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:15,784-Speed 3106.52 samples/sec   Loss 12.3575   LearningRate 0.0671   Epoch: 3   Global Step: 44950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:19,098-Speed 3090.63 samples/sec   Loss 12.2137   LearningRate 0.0671   Epoch: 3   Global Step: 44960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:22,418-Speed 3085.71 samples/sec   Loss 12.2353   LearningRate 0.0671   Epoch: 3   Global Step: 44970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:25,773-Speed 3052.70 samples/sec   Loss 12.2729   LearningRate 0.0671   Epoch: 3   Global Step: 44980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:29,093-Speed 3085.59 samples/sec   Loss 12.2524   LearningRate 0.0671   Epoch: 3   Global Step: 44990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:32,374-Speed 3121.26 samples/sec   Loss 12.2103   LearningRate 0.0671   Epoch: 3   Global Step: 45000   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:57:35,621-Speed 3154.76 samples/sec   Loss 12.2746   LearningRate 0.0670   Epoch: 3   Global Step: 45010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:38,971-Speed 3058.57 samples/sec   Loss 12.2340   LearningRate 0.0670   Epoch: 3   Global Step: 45020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:42,330-Speed 3049.61 samples/sec   Loss 12.2470   LearningRate 0.0670   Epoch: 3   Global Step: 45030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:45,679-Speed 3059.29 samples/sec   Loss 12.2890   LearningRate 0.0670   Epoch: 3   Global Step: 45040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:48,959-Speed 3124.53 samples/sec   Loss 12.1206   LearningRate 0.0670   Epoch: 3   Global Step: 45050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:52,270-Speed 3094.09 samples/sec   Loss 12.1884   LearningRate 0.0670   Epoch: 3   Global Step: 45060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:55,656-Speed 3025.72 samples/sec   Loss 12.2108   LearningRate 0.0670   Epoch: 3   Global Step: 45070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:57:59,008-Speed 3055.23 samples/sec   Loss 12.3746   LearningRate 0.0670   Epoch: 3   Global Step: 45080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:58:02,403-Speed 3017.94 samples/sec   Loss 12.1899   LearningRate 0.0670   Epoch: 3   Global Step: 45090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:58:05,710-Speed 3097.37 samples/sec   Loss 12.2396   LearningRate 0.0670   Epoch: 3   Global Step: 45100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:58:09,020-Speed 3094.34 samples/sec   Loss 12.3952   LearningRate 0.0670   Epoch: 3   Global Step: 45110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:58:12,324-Speed 3099.84 samples/sec   Loss 12.2921   LearningRate 0.0670   Epoch: 3   Global Step: 45120   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:58:15,648-Speed 3081.71 samples/sec   Loss 12.3781   LearningRate 0.0670   Epoch: 3   Global Step: 45130   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:58:18,954-Speed 3098.10 samples/sec   Loss 12.2428   LearningRate 0.0670   Epoch: 3   Global Step: 45140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:58:22,295-Speed 3066.31 samples/sec   Loss 12.2178   LearningRate 0.0670   Epoch: 3   Global Step: 45150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:58:25,651-Speed 3052.02 samples/sec   Loss 12.1595   LearningRate 0.0669   Epoch: 3   Global Step: 45160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:58:28,962-Speed 3093.57 samples/sec   Loss 12.1822   LearningRate 0.0669   Epoch: 3   Global Step: 45170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:58:32,273-Speed 3093.42 samples/sec   Loss 12.2966   LearningRate 0.0669   Epoch: 3   Global Step: 45180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:58:35,641-Speed 3041.97 samples/sec   Loss 12.3516   LearningRate 0.0669   Epoch: 3   Global Step: 45190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:58:38,977-Speed 3069.70 samples/sec   Loss 12.2980   LearningRate 0.0669   Epoch: 3   Global Step: 45200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:58:42,272-Speed 3109.23 samples/sec   Loss 12.1896   LearningRate 0.0669   Epoch: 3   Global Step: 45210   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 05:58:45,593-Speed 3085.41 samples/sec   Loss 12.3391   LearningRate 0.0669   Epoch: 3   Global Step: 45220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:58:49,061-Speed 2952.98 samples/sec   Loss 12.3223   LearningRate 0.0669   Epoch: 3   Global Step: 45230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:58:52,434-Speed 3036.65 samples/sec   Loss 12.3270   LearningRate 0.0669   Epoch: 3   Global Step: 45240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:58:55,805-Speed 3039.41 samples/sec   Loss 12.1968   LearningRate 0.0669   Epoch: 3   Global Step: 45250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:58:59,175-Speed 3038.72 samples/sec   Loss 12.1952   LearningRate 0.0669   Epoch: 3   Global Step: 45260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:59:02,584-Speed 3005.44 samples/sec   Loss 12.3980   LearningRate 0.0669   Epoch: 3   Global Step: 45270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:59:05,847-Speed 3138.91 samples/sec   Loss 12.1447   LearningRate 0.0669   Epoch: 3   Global Step: 45280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:59:09,202-Speed 3053.14 samples/sec   Loss 12.1274   LearningRate 0.0669   Epoch: 3   Global Step: 45290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:59:12,523-Speed 3084.39 samples/sec   Loss 12.1944   LearningRate 0.0669   Epoch: 3   Global Step: 45300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:59:15,924-Speed 3011.43 samples/sec   Loss 12.1533   LearningRate 0.0668   Epoch: 3   Global Step: 45310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 05:59:19,298-Speed 3035.67 samples/sec   Loss 12.2455   LearningRate 0.0668   Epoch: 3   Global Step: 45320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:59:22,673-Speed 3034.83 samples/sec   Loss 12.3415   LearningRate 0.0668   Epoch: 3   Global Step: 45330   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:59:26,086-Speed 3001.35 samples/sec   Loss 12.2339   LearningRate 0.0668   Epoch: 3   Global Step: 45340   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:59:29,433-Speed 3060.54 samples/sec   Loss 12.2373   LearningRate 0.0668   Epoch: 3   Global Step: 45350   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:59:32,722-Speed 3113.96 samples/sec   Loss 12.1898   LearningRate 0.0668   Epoch: 3   Global Step: 45360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:59:36,046-Speed 3081.86 samples/sec   Loss 12.2014   LearningRate 0.0668   Epoch: 3   Global Step: 45370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:59:39,495-Speed 2969.50 samples/sec   Loss 12.3762   LearningRate 0.0668   Epoch: 3   Global Step: 45380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:59:42,819-Speed 3081.67 samples/sec   Loss 12.2697   LearningRate 0.0668   Epoch: 3   Global Step: 45390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:59:46,148-Speed 3076.99 samples/sec   Loss 12.3187   LearningRate 0.0668   Epoch: 3   Global Step: 45400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:59:49,451-Speed 3100.97 samples/sec   Loss 12.2633   LearningRate 0.0668   Epoch: 3   Global Step: 45410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:59:52,790-Speed 3067.98 samples/sec   Loss 12.3408   LearningRate 0.0668   Epoch: 3   Global Step: 45420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:59:56,103-Speed 3091.34 samples/sec   Loss 12.1121   LearningRate 0.0668   Epoch: 3   Global Step: 45430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 05:59:59,467-Speed 3045.28 samples/sec   Loss 12.1909   LearningRate 0.0668   Epoch: 3   Global Step: 45440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:00:02,761-Speed 3109.61 samples/sec   Loss 12.2458   LearningRate 0.0668   Epoch: 3   Global Step: 45450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:00:06,046-Speed 3118.05 samples/sec   Loss 12.4000   LearningRate 0.0667   Epoch: 3   Global Step: 45460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:00:09,413-Speed 3041.42 samples/sec   Loss 12.1353   LearningRate 0.0667   Epoch: 3   Global Step: 45470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:00:12,795-Speed 3029.58 samples/sec   Loss 12.1258   LearningRate 0.0667   Epoch: 3   Global Step: 45480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:00:16,236-Speed 2976.86 samples/sec   Loss 12.3625   LearningRate 0.0667   Epoch: 3   Global Step: 45490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:00:19,581-Speed 3061.95 samples/sec   Loss 12.0884   LearningRate 0.0667   Epoch: 3   Global Step: 45500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:00:22,840-Speed 3143.17 samples/sec   Loss 12.2661   LearningRate 0.0667   Epoch: 3   Global Step: 45510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:00:26,120-Speed 3122.33 samples/sec   Loss 12.2945   LearningRate 0.0667   Epoch: 3   Global Step: 45520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:00:29,429-Speed 3096.31 samples/sec   Loss 12.1118   LearningRate 0.0667   Epoch: 3   Global Step: 45530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:00:32,815-Speed 3024.82 samples/sec   Loss 12.2109   LearningRate 0.0667   Epoch: 3   Global Step: 45540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:00:36,137-Speed 3082.97 samples/sec   Loss 12.2981   LearningRate 0.0667   Epoch: 3   Global Step: 45550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:00:39,442-Speed 3099.64 samples/sec   Loss 12.1699   LearningRate 0.0667   Epoch: 3   Global Step: 45560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:00:42,758-Speed 3088.59 samples/sec   Loss 12.0507   LearningRate 0.0667   Epoch: 3   Global Step: 45570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:00:46,063-Speed 3100.60 samples/sec   Loss 12.3340   LearningRate 0.0667   Epoch: 3   Global Step: 45580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:00:49,436-Speed 3036.72 samples/sec   Loss 12.1204   LearningRate 0.0667   Epoch: 3   Global Step: 45590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:00:52,830-Speed 3017.18 samples/sec   Loss 12.1528   LearningRate 0.0667   Epoch: 3   Global Step: 45600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:00:56,166-Speed 3070.85 samples/sec   Loss 12.2769   LearningRate 0.0667   Epoch: 3   Global Step: 45610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:00:59,517-Speed 3056.35 samples/sec   Loss 12.2296   LearningRate 0.0666   Epoch: 3   Global Step: 45620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:01:02,915-Speed 3014.92 samples/sec   Loss 12.3475   LearningRate 0.0666   Epoch: 3   Global Step: 45630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:01:06,252-Speed 3069.16 samples/sec   Loss 12.3297   LearningRate 0.0666   Epoch: 3   Global Step: 45640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:01:09,583-Speed 3075.38 samples/sec   Loss 12.2000   LearningRate 0.0666   Epoch: 3   Global Step: 45650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:01:12,883-Speed 3104.17 samples/sec   Loss 12.1228   LearningRate 0.0666   Epoch: 3   Global Step: 45660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:01:16,198-Speed 3092.20 samples/sec   Loss 12.1283   LearningRate 0.0666   Epoch: 3   Global Step: 45670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:01:19,496-Speed 3105.57 samples/sec   Loss 12.1286   LearningRate 0.0666   Epoch: 3   Global Step: 45680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:01:22,804-Speed 3096.03 samples/sec   Loss 12.3008   LearningRate 0.0666   Epoch: 3   Global Step: 45690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:01:26,191-Speed 3024.12 samples/sec   Loss 12.1592   LearningRate 0.0666   Epoch: 3   Global Step: 45700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:01:29,531-Speed 3067.34 samples/sec   Loss 12.2270   LearningRate 0.0666   Epoch: 3   Global Step: 45710   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:01:32,886-Speed 3052.88 samples/sec   Loss 12.3166   LearningRate 0.0666   Epoch: 3   Global Step: 45720   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:01:36,284-Speed 3014.98 samples/sec   Loss 12.1846   LearningRate 0.0666   Epoch: 3   Global Step: 45730   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:01:39,601-Speed 3088.05 samples/sec   Loss 12.2877   LearningRate 0.0666   Epoch: 3   Global Step: 45740   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:01:42,961-Speed 3048.89 samples/sec   Loss 12.2481   LearningRate 0.0666   Epoch: 3   Global Step: 45750   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:01:46,212-Speed 3150.26 samples/sec   Loss 12.3965   LearningRate 0.0666   Epoch: 3   Global Step: 45760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:01:49,498-Speed 3117.66 samples/sec   Loss 12.3217   LearningRate 0.0665   Epoch: 3   Global Step: 45770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:01:52,814-Speed 3088.61 samples/sec   Loss 12.3936   LearningRate 0.0665   Epoch: 3   Global Step: 45780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:01:56,122-Speed 3096.05 samples/sec   Loss 12.2450   LearningRate 0.0665   Epoch: 3   Global Step: 45790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:01:59,412-Speed 3114.03 samples/sec   Loss 12.2030   LearningRate 0.0665   Epoch: 3   Global Step: 45800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:02:02,703-Speed 3112.15 samples/sec   Loss 12.1002   LearningRate 0.0665   Epoch: 3   Global Step: 45810   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:02:05,975-Speed 3131.42 samples/sec   Loss 12.3087   LearningRate 0.0665   Epoch: 3   Global Step: 45820   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:02:09,277-Speed 3102.45 samples/sec   Loss 12.1870   LearningRate 0.0665   Epoch: 3   Global Step: 45830   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:02:12,562-Speed 3118.39 samples/sec   Loss 12.3334   LearningRate 0.0665   Epoch: 3   Global Step: 45840   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:02:15,855-Speed 3109.77 samples/sec   Loss 12.3737   LearningRate 0.0665   Epoch: 3   Global Step: 45850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:02:19,213-Speed 3049.98 samples/sec   Loss 12.2590   LearningRate 0.0665   Epoch: 3   Global Step: 45860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:02:22,589-Speed 3034.07 samples/sec   Loss 12.2717   LearningRate 0.0665   Epoch: 3   Global Step: 45870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:02:25,978-Speed 3022.81 samples/sec   Loss 12.1308   LearningRate 0.0665   Epoch: 3   Global Step: 45880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:02:29,369-Speed 3020.64 samples/sec   Loss 12.2963   LearningRate 0.0665   Epoch: 3   Global Step: 45890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:02:32,666-Speed 3106.37 samples/sec   Loss 12.1359   LearningRate 0.0665   Epoch: 3   Global Step: 45900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:02:36,047-Speed 3029.69 samples/sec   Loss 12.0449   LearningRate 0.0665   Epoch: 3   Global Step: 45910   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:02:39,398-Speed 3056.64 samples/sec   Loss 12.2284   LearningRate 0.0664   Epoch: 3   Global Step: 45920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:02:42,698-Speed 3104.49 samples/sec   Loss 12.0871   LearningRate 0.0664   Epoch: 3   Global Step: 45930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:02:46,004-Speed 3097.90 samples/sec   Loss 12.1692   LearningRate 0.0664   Epoch: 3   Global Step: 45940   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:02:49,339-Speed 3071.92 samples/sec   Loss 12.1835   LearningRate 0.0664   Epoch: 3   Global Step: 45950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:02:52,675-Speed 3070.09 samples/sec   Loss 12.1830   LearningRate 0.0664   Epoch: 3   Global Step: 45960   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:02:56,062-Speed 3024.69 samples/sec   Loss 12.1724   LearningRate 0.0664   Epoch: 3   Global Step: 45970   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:02:59,429-Speed 3041.91 samples/sec   Loss 12.2625   LearningRate 0.0664   Epoch: 3   Global Step: 45980   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:02,801-Speed 3037.40 samples/sec   Loss 12.1990   LearningRate 0.0664   Epoch: 3   Global Step: 45990   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:06,175-Speed 3035.58 samples/sec   Loss 12.0902   LearningRate 0.0664   Epoch: 3   Global Step: 46000   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:09,471-Speed 3108.45 samples/sec   Loss 12.0105   LearningRate 0.0664   Epoch: 3   Global Step: 46010   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:12,820-Speed 3058.40 samples/sec   Loss 12.2529   LearningRate 0.0664   Epoch: 3   Global Step: 46020   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:16,135-Speed 3090.59 samples/sec   Loss 12.1180   LearningRate 0.0664   Epoch: 3   Global Step: 46030   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:19,416-Speed 3121.48 samples/sec   Loss 12.2574   LearningRate 0.0664   Epoch: 3   Global Step: 46040   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:22,716-Speed 3104.20 samples/sec   Loss 12.1624   LearningRate 0.0664   Epoch: 3   Global Step: 46050   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:26,053-Speed 3069.61 samples/sec   Loss 12.1019   LearningRate 0.0664   Epoch: 3   Global Step: 46060   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:29,424-Speed 3038.12 samples/sec   Loss 12.2317   LearningRate 0.0663   Epoch: 3   Global Step: 46070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:32,760-Speed 3070.83 samples/sec   Loss 12.3519   LearningRate 0.0663   Epoch: 3   Global Step: 46080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:36,058-Speed 3105.99 samples/sec   Loss 11.9728   LearningRate 0.0663   Epoch: 3   Global Step: 46090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:39,351-Speed 3110.53 samples/sec   Loss 12.1091   LearningRate 0.0663   Epoch: 3   Global Step: 46100   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:42,691-Speed 3066.82 samples/sec   Loss 12.1851   LearningRate 0.0663   Epoch: 3   Global Step: 46110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:46,028-Speed 3069.81 samples/sec   Loss 12.3646   LearningRate 0.0663   Epoch: 3   Global Step: 46120   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:49,370-Speed 3064.08 samples/sec   Loss 12.2763   LearningRate 0.0663   Epoch: 3   Global Step: 46130   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:52,733-Speed 3046.48 samples/sec   Loss 12.2401   LearningRate 0.0663   Epoch: 3   Global Step: 46140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:56,064-Speed 3074.41 samples/sec   Loss 12.1488   LearningRate 0.0663   Epoch: 3   Global Step: 46150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:03:59,328-Speed 3138.39 samples/sec   Loss 12.0901   LearningRate 0.0663   Epoch: 3   Global Step: 46160   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 06:04:02,695-Speed 3042.08 samples/sec   Loss 12.1838   LearningRate 0.0663   Epoch: 3   Global Step: 46170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:06,035-Speed 3066.95 samples/sec   Loss 12.1757   LearningRate 0.0663   Epoch: 3   Global Step: 46180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:09,351-Speed 3088.87 samples/sec   Loss 12.2286   LearningRate 0.0663   Epoch: 3   Global Step: 46190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:12,704-Speed 3055.51 samples/sec   Loss 12.1668   LearningRate 0.0663   Epoch: 3   Global Step: 46200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:16,076-Speed 3037.36 samples/sec   Loss 12.3754   LearningRate 0.0663   Epoch: 3   Global Step: 46210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:19,400-Speed 3082.34 samples/sec   Loss 12.1156   LearningRate 0.0663   Epoch: 3   Global Step: 46220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:22,732-Speed 3073.61 samples/sec   Loss 12.3309   LearningRate 0.0662   Epoch: 3   Global Step: 46230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:26,039-Speed 3097.69 samples/sec   Loss 12.3336   LearningRate 0.0662   Epoch: 3   Global Step: 46240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:29,338-Speed 3104.91 samples/sec   Loss 12.2384   LearningRate 0.0662   Epoch: 3   Global Step: 46250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:32,641-Speed 3101.38 samples/sec   Loss 12.0258   LearningRate 0.0662   Epoch: 3   Global Step: 46260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:35,991-Speed 3058.01 samples/sec   Loss 12.1039   LearningRate 0.0662   Epoch: 3   Global Step: 46270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:39,289-Speed 3106.20 samples/sec   Loss 12.3052   LearningRate 0.0662   Epoch: 3   Global Step: 46280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:42,641-Speed 3055.11 samples/sec   Loss 12.2394   LearningRate 0.0662   Epoch: 3   Global Step: 46290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:45,937-Speed 3108.31 samples/sec   Loss 12.1883   LearningRate 0.0662   Epoch: 3   Global Step: 46300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:49,352-Speed 2998.64 samples/sec   Loss 12.1557   LearningRate 0.0662   Epoch: 3   Global Step: 46310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:52,719-Speed 3042.38 samples/sec   Loss 12.2865   LearningRate 0.0662   Epoch: 3   Global Step: 46320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:56,127-Speed 3006.14 samples/sec   Loss 12.2572   LearningRate 0.0662   Epoch: 3   Global Step: 46330   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:04:59,465-Speed 3068.82 samples/sec   Loss 12.1056   LearningRate 0.0662   Epoch: 3   Global Step: 46340   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:05:02,853-Speed 3022.95 samples/sec   Loss 12.1057   LearningRate 0.0662   Epoch: 3   Global Step: 46350   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:05:06,182-Speed 3077.38 samples/sec   Loss 12.0857   LearningRate 0.0662   Epoch: 3   Global Step: 46360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:05:09,524-Speed 3064.79 samples/sec   Loss 12.1423   LearningRate 0.0662   Epoch: 3   Global Step: 46370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:05:12,931-Speed 3005.83 samples/sec   Loss 12.2254   LearningRate 0.0661   Epoch: 3   Global Step: 46380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:05:16,210-Speed 3124.43 samples/sec   Loss 12.2817   LearningRate 0.0661   Epoch: 3   Global Step: 46390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:05:19,507-Speed 3106.79 samples/sec   Loss 12.1150   LearningRate 0.0661   Epoch: 3   Global Step: 46400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:05:22,814-Speed 3097.31 samples/sec   Loss 12.0503   LearningRate 0.0661   Epoch: 3   Global Step: 46410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:05:26,136-Speed 3083.04 samples/sec   Loss 12.2277   LearningRate 0.0661   Epoch: 3   Global Step: 46420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:05:29,511-Speed 3035.46 samples/sec   Loss 12.1283   LearningRate 0.0661   Epoch: 3   Global Step: 46430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:05:32,837-Speed 3079.06 samples/sec   Loss 12.1088   LearningRate 0.0661   Epoch: 3   Global Step: 46440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:05:36,157-Speed 3084.95 samples/sec   Loss 12.1310   LearningRate 0.0661   Epoch: 3   Global Step: 46450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:05:39,477-Speed 3085.75 samples/sec   Loss 12.1815   LearningRate 0.0661   Epoch: 3   Global Step: 46460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:05:42,823-Speed 3060.29 samples/sec   Loss 12.0538   LearningRate 0.0661   Epoch: 3   Global Step: 46470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:05:46,084-Speed 3142.01 samples/sec   Loss 12.1102   LearningRate 0.0661   Epoch: 3   Global Step: 46480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:05:49,398-Speed 3090.84 samples/sec   Loss 12.0994   LearningRate 0.0661   Epoch: 3   Global Step: 46490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:05:52,757-Speed 3049.49 samples/sec   Loss 12.0570   LearningRate 0.0661   Epoch: 3   Global Step: 46500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:05:56,101-Speed 3062.90 samples/sec   Loss 12.1123   LearningRate 0.0661   Epoch: 3   Global Step: 46510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:05:59,515-Speed 2999.94 samples/sec   Loss 12.3364   LearningRate 0.0661   Epoch: 3   Global Step: 46520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:06:02,928-Speed 3001.05 samples/sec   Loss 12.0844   LearningRate 0.0660   Epoch: 3   Global Step: 46530   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:06:06,300-Speed 3037.95 samples/sec   Loss 12.2107   LearningRate 0.0660   Epoch: 3   Global Step: 46540   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:06:09,716-Speed 2998.09 samples/sec   Loss 12.1246   LearningRate 0.0660   Epoch: 3   Global Step: 46550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:06:13,039-Speed 3082.16 samples/sec   Loss 12.0701   LearningRate 0.0660   Epoch: 3   Global Step: 46560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:06:16,392-Speed 3055.88 samples/sec   Loss 12.3296   LearningRate 0.0660   Epoch: 3   Global Step: 46570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:06:19,794-Speed 3010.24 samples/sec   Loss 12.1560   LearningRate 0.0660   Epoch: 3   Global Step: 46580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:06:23,161-Speed 3042.29 samples/sec   Loss 12.0234   LearningRate 0.0660   Epoch: 3   Global Step: 46590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:06:26,544-Speed 3027.47 samples/sec   Loss 12.2341   LearningRate 0.0660   Epoch: 3   Global Step: 46600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:06:29,848-Speed 3100.37 samples/sec   Loss 12.0960   LearningRate 0.0660   Epoch: 3   Global Step: 46610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:06:33,170-Speed 3083.93 samples/sec   Loss 12.0924   LearningRate 0.0660   Epoch: 3   Global Step: 46620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:06:36,466-Speed 3107.37 samples/sec   Loss 11.9782   LearningRate 0.0660   Epoch: 3   Global Step: 46630   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 06:06:39,779-Speed 3091.49 samples/sec   Loss 12.2870   LearningRate 0.0660   Epoch: 3   Global Step: 46640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:06:43,058-Speed 3124.50 samples/sec   Loss 12.0518   LearningRate 0.0660   Epoch: 3   Global Step: 46650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:06:46,404-Speed 3060.84 samples/sec   Loss 12.2461   LearningRate 0.0660   Epoch: 3   Global Step: 46660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:06:49,690-Speed 3116.87 samples/sec   Loss 12.0692   LearningRate 0.0660   Epoch: 3   Global Step: 46670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:06:52,988-Speed 3106.30 samples/sec   Loss 12.2173   LearningRate 0.0659   Epoch: 3   Global Step: 46680   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:06:56,311-Speed 3082.54 samples/sec   Loss 12.1715   LearningRate 0.0659   Epoch: 3   Global Step: 46690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:06:59,660-Speed 3058.32 samples/sec   Loss 12.1746   LearningRate 0.0659   Epoch: 3   Global Step: 46700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:07:03,001-Speed 3066.10 samples/sec   Loss 12.2786   LearningRate 0.0659   Epoch: 3   Global Step: 46710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:07:06,351-Speed 3057.93 samples/sec   Loss 12.1090   LearningRate 0.0659   Epoch: 3   Global Step: 46720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:07:09,707-Speed 3052.00 samples/sec   Loss 12.0826   LearningRate 0.0659   Epoch: 3   Global Step: 46730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:07:13,040-Speed 3072.98 samples/sec   Loss 12.0911   LearningRate 0.0659   Epoch: 3   Global Step: 46740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:07:16,403-Speed 3046.18 samples/sec   Loss 12.1108   LearningRate 0.0659   Epoch: 3   Global Step: 46750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:07:19,783-Speed 3030.43 samples/sec   Loss 12.1841   LearningRate 0.0659   Epoch: 3   Global Step: 46760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:07:23,093-Speed 3094.97 samples/sec   Loss 12.0911   LearningRate 0.0659   Epoch: 3   Global Step: 46770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:07:26,480-Speed 3024.23 samples/sec   Loss 12.1768   LearningRate 0.0659   Epoch: 3   Global Step: 46780   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:07:29,822-Speed 3064.19 samples/sec   Loss 12.1760   LearningRate 0.0659   Epoch: 3   Global Step: 46790   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:07:33,205-Speed 3027.90 samples/sec   Loss 12.1735   LearningRate 0.0659   Epoch: 3   Global Step: 46800   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:07:36,593-Speed 3023.80 samples/sec   Loss 12.1883   LearningRate 0.0659   Epoch: 3   Global Step: 46810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:07:39,953-Speed 3048.38 samples/sec   Loss 12.0950   LearningRate 0.0659   Epoch: 3   Global Step: 46820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:07:43,342-Speed 3022.94 samples/sec   Loss 12.2753   LearningRate 0.0659   Epoch: 3   Global Step: 46830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:07:46,692-Speed 3057.19 samples/sec   Loss 12.0439   LearningRate 0.0658   Epoch: 3   Global Step: 46840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:07:50,000-Speed 3096.94 samples/sec   Loss 12.1505   LearningRate 0.0658   Epoch: 3   Global Step: 46850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:07:53,376-Speed 3034.11 samples/sec   Loss 12.1665   LearningRate 0.0658   Epoch: 3   Global Step: 46860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:07:56,738-Speed 3046.65 samples/sec   Loss 12.1962   LearningRate 0.0658   Epoch: 3   Global Step: 46870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:00,058-Speed 3084.72 samples/sec   Loss 12.1061   LearningRate 0.0658   Epoch: 3   Global Step: 46880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:03,330-Speed 3130.35 samples/sec   Loss 12.1100   LearningRate 0.0658   Epoch: 3   Global Step: 46890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:06,662-Speed 3074.68 samples/sec   Loss 12.0665   LearningRate 0.0658   Epoch: 3   Global Step: 46900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:10,009-Speed 3060.72 samples/sec   Loss 12.1966   LearningRate 0.0658   Epoch: 3   Global Step: 46910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:13,401-Speed 3019.09 samples/sec   Loss 12.2276   LearningRate 0.0658   Epoch: 3   Global Step: 46920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:16,764-Speed 3046.17 samples/sec   Loss 12.1021   LearningRate 0.0658   Epoch: 3   Global Step: 46930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:20,053-Speed 3114.23 samples/sec   Loss 12.1109   LearningRate 0.0658   Epoch: 3   Global Step: 46940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:23,355-Speed 3102.95 samples/sec   Loss 12.2303   LearningRate 0.0658   Epoch: 3   Global Step: 46950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:08:26,682-Speed 3078.90 samples/sec   Loss 12.2072   LearningRate 0.0658   Epoch: 3   Global Step: 46960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:30,021-Speed 3067.25 samples/sec   Loss 12.0061   LearningRate 0.0658   Epoch: 3   Global Step: 46970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:33,342-Speed 3084.53 samples/sec   Loss 12.2295   LearningRate 0.0658   Epoch: 3   Global Step: 46980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:36,649-Speed 3097.68 samples/sec   Loss 12.0647   LearningRate 0.0657   Epoch: 3   Global Step: 46990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:39,939-Speed 3113.61 samples/sec   Loss 11.9765   LearningRate 0.0657   Epoch: 3   Global Step: 47000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:43,296-Speed 3050.54 samples/sec   Loss 12.3553   LearningRate 0.0657   Epoch: 3   Global Step: 47010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:46,670-Speed 3036.22 samples/sec   Loss 12.2312   LearningRate 0.0657   Epoch: 3   Global Step: 47020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:50,013-Speed 3064.25 samples/sec   Loss 12.0114   LearningRate 0.0657   Epoch: 3   Global Step: 47030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:53,408-Speed 3016.73 samples/sec   Loss 12.1842   LearningRate 0.0657   Epoch: 3   Global Step: 47040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:08:56,759-Speed 3057.30 samples/sec   Loss 12.0820   LearningRate 0.0657   Epoch: 3   Global Step: 47050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:09:00,115-Speed 3051.32 samples/sec   Loss 12.2177   LearningRate 0.0657   Epoch: 3   Global Step: 47060   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:09:03,416-Speed 3103.79 samples/sec   Loss 12.2724   LearningRate 0.0657   Epoch: 3   Global Step: 47070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:09:06,754-Speed 3068.44 samples/sec   Loss 11.9520   LearningRate 0.0657   Epoch: 3   Global Step: 47080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:09:10,061-Speed 3097.69 samples/sec   Loss 12.1612   LearningRate 0.0657   Epoch: 3   Global Step: 47090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:09:13,442-Speed 3029.59 samples/sec   Loss 12.2317   LearningRate 0.0657   Epoch: 3   Global Step: 47100   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:09:16,747-Speed 3099.51 samples/sec   Loss 12.0277   LearningRate 0.0657   Epoch: 3   Global Step: 47110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:09:20,080-Speed 3072.91 samples/sec   Loss 12.0614   LearningRate 0.0657   Epoch: 3   Global Step: 47120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:09:23,390-Speed 3094.84 samples/sec   Loss 12.1619   LearningRate 0.0657   Epoch: 3   Global Step: 47130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:09:26,717-Speed 3078.57 samples/sec   Loss 12.1054   LearningRate 0.0656   Epoch: 3   Global Step: 47140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:09:29,998-Speed 3122.12 samples/sec   Loss 12.0885   LearningRate 0.0656   Epoch: 3   Global Step: 47150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:09:33,326-Speed 3077.46 samples/sec   Loss 12.1081   LearningRate 0.0656   Epoch: 3   Global Step: 47160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:09:36,655-Speed 3076.94 samples/sec   Loss 12.1633   LearningRate 0.0656   Epoch: 3   Global Step: 47170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:09:40,012-Speed 3051.07 samples/sec   Loss 12.2000   LearningRate 0.0656   Epoch: 3   Global Step: 47180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:09:43,309-Speed 3107.65 samples/sec   Loss 12.1628   LearningRate 0.0656   Epoch: 3   Global Step: 47190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:09:46,577-Speed 3134.36 samples/sec   Loss 12.1095   LearningRate 0.0656   Epoch: 3   Global Step: 47200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:09:49,949-Speed 3037.42 samples/sec   Loss 11.9439   LearningRate 0.0656   Epoch: 3   Global Step: 47210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:09:53,378-Speed 2987.04 samples/sec   Loss 12.1476   LearningRate 0.0656   Epoch: 3   Global Step: 47220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:09:56,753-Speed 3035.97 samples/sec   Loss 12.0995   LearningRate 0.0656   Epoch: 3   Global Step: 47230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:00,121-Speed 3040.91 samples/sec   Loss 12.1312   LearningRate 0.0656   Epoch: 3   Global Step: 47240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:03,503-Speed 3028.57 samples/sec   Loss 12.1437   LearningRate 0.0656   Epoch: 3   Global Step: 47250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:06,901-Speed 3014.33 samples/sec   Loss 12.1619   LearningRate 0.0656   Epoch: 3   Global Step: 47260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:10,232-Speed 3074.95 samples/sec   Loss 12.1585   LearningRate 0.0656   Epoch: 3   Global Step: 47270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:13,552-Speed 3085.45 samples/sec   Loss 12.1127   LearningRate 0.0656   Epoch: 3   Global Step: 47280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:16,880-Speed 3078.11 samples/sec   Loss 12.0247   LearningRate 0.0656   Epoch: 3   Global Step: 47290   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:20,245-Speed 3043.91 samples/sec   Loss 12.1246   LearningRate 0.0655   Epoch: 3   Global Step: 47300   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:23,607-Speed 3046.30 samples/sec   Loss 12.0968   LearningRate 0.0655   Epoch: 3   Global Step: 47310   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:26,940-Speed 3074.13 samples/sec   Loss 12.0554   LearningRate 0.0655   Epoch: 3   Global Step: 47320   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:30,286-Speed 3060.54 samples/sec   Loss 12.0931   LearningRate 0.0655   Epoch: 3   Global Step: 47330   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:33,683-Speed 3016.50 samples/sec   Loss 11.9874   LearningRate 0.0655   Epoch: 3   Global Step: 47340   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:37,051-Speed 3040.45 samples/sec   Loss 11.9903   LearningRate 0.0655   Epoch: 3   Global Step: 47350   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:40,385-Speed 3072.38 samples/sec   Loss 12.1202   LearningRate 0.0655   Epoch: 3   Global Step: 47360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:43,751-Speed 3043.22 samples/sec   Loss 12.0959   LearningRate 0.0655   Epoch: 3   Global Step: 47370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:47,129-Speed 3032.06 samples/sec   Loss 11.9588   LearningRate 0.0655   Epoch: 3   Global Step: 47380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:50,534-Speed 3008.04 samples/sec   Loss 12.1008   LearningRate 0.0655   Epoch: 3   Global Step: 47390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:53,887-Speed 3055.37 samples/sec   Loss 12.0515   LearningRate 0.0655   Epoch: 3   Global Step: 47400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:10:57,211-Speed 3081.37 samples/sec   Loss 12.2769   LearningRate 0.0655   Epoch: 3   Global Step: 47410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:00,560-Speed 3059.02 samples/sec   Loss 12.0777   LearningRate 0.0655   Epoch: 3   Global Step: 47420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:03,933-Speed 3036.67 samples/sec   Loss 12.1677   LearningRate 0.0655   Epoch: 3   Global Step: 47430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:07,269-Speed 3070.70 samples/sec   Loss 12.0520   LearningRate 0.0655   Epoch: 3   Global Step: 47440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:10,538-Speed 3133.72 samples/sec   Loss 12.1240   LearningRate 0.0654   Epoch: 3   Global Step: 47450   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:13,818-Speed 3122.13 samples/sec   Loss 11.9635   LearningRate 0.0654   Epoch: 3   Global Step: 47460   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:17,179-Speed 3048.21 samples/sec   Loss 12.0728   LearningRate 0.0654   Epoch: 3   Global Step: 47470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:20,534-Speed 3053.07 samples/sec   Loss 12.0544   LearningRate 0.0654   Epoch: 3   Global Step: 47480   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:23,890-Speed 3051.79 samples/sec   Loss 12.1657   LearningRate 0.0654   Epoch: 3   Global Step: 47490   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:27,191-Speed 3103.03 samples/sec   Loss 12.0951   LearningRate 0.0654   Epoch: 3   Global Step: 47500   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:30,569-Speed 3031.94 samples/sec   Loss 11.9874   LearningRate 0.0654   Epoch: 3   Global Step: 47510   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 06:11:33,955-Speed 3025.58 samples/sec   Loss 12.0162   LearningRate 0.0654   Epoch: 3   Global Step: 47520   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:37,312-Speed 3051.05 samples/sec   Loss 12.0082   LearningRate 0.0654   Epoch: 3   Global Step: 47530   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:40,725-Speed 3000.94 samples/sec   Loss 12.0262   LearningRate 0.0654   Epoch: 3   Global Step: 47540   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:44,072-Speed 3060.22 samples/sec   Loss 12.0793   LearningRate 0.0654   Epoch: 3   Global Step: 47550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:47,368-Speed 3108.10 samples/sec   Loss 11.9963   LearningRate 0.0654   Epoch: 3   Global Step: 47560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:50,669-Speed 3102.85 samples/sec   Loss 12.0906   LearningRate 0.0654   Epoch: 3   Global Step: 47570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:54,054-Speed 3026.02 samples/sec   Loss 12.1313   LearningRate 0.0654   Epoch: 3   Global Step: 47580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:11:57,395-Speed 3066.68 samples/sec   Loss 12.2214   LearningRate 0.0654   Epoch: 3   Global Step: 47590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:12:00,741-Speed 3061.01 samples/sec   Loss 12.0871   LearningRate 0.0653   Epoch: 3   Global Step: 47600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:12:04,160-Speed 2995.94 samples/sec   Loss 12.1443   LearningRate 0.0653   Epoch: 3   Global Step: 47610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:12:07,486-Speed 3079.27 samples/sec   Loss 12.1358   LearningRate 0.0653   Epoch: 3   Global Step: 47620   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 06:12:10,756-Speed 3131.96 samples/sec   Loss 12.1064   LearningRate 0.0653   Epoch: 3   Global Step: 47630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:12:14,118-Speed 3047.40 samples/sec   Loss 12.0990   LearningRate 0.0653   Epoch: 3   Global Step: 47640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:12:17,436-Speed 3086.99 samples/sec   Loss 12.1277   LearningRate 0.0653   Epoch: 3   Global Step: 47650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:12:20,805-Speed 3039.71 samples/sec   Loss 11.9947   LearningRate 0.0653   Epoch: 3   Global Step: 47660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:12:24,237-Speed 2985.05 samples/sec   Loss 12.0681   LearningRate 0.0653   Epoch: 3   Global Step: 47670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:12:27,565-Speed 3077.25 samples/sec   Loss 11.9034   LearningRate 0.0653   Epoch: 3   Global Step: 47680   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:12:30,960-Speed 3020.48 samples/sec   Loss 12.0767   LearningRate 0.0653   Epoch: 3   Global Step: 47690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:12:34,308-Speed 3059.12 samples/sec   Loss 12.0780   LearningRate 0.0653   Epoch: 3   Global Step: 47700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:12:37,613-Speed 3099.64 samples/sec   Loss 12.0977   LearningRate 0.0653   Epoch: 3   Global Step: 47710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:12:40,998-Speed 3026.43 samples/sec   Loss 11.8794   LearningRate 0.0653   Epoch: 3   Global Step: 47720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:12:44,381-Speed 3027.63 samples/sec   Loss 12.0886   LearningRate 0.0653   Epoch: 3   Global Step: 47730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:12:47,739-Speed 3050.50 samples/sec   Loss 12.1018   LearningRate 0.0653   Epoch: 3   Global Step: 47740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:12:51,038-Speed 3104.95 samples/sec   Loss 11.8268   LearningRate 0.0653   Epoch: 3   Global Step: 47750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:12:54,417-Speed 3031.28 samples/sec   Loss 12.0410   LearningRate 0.0652   Epoch: 3   Global Step: 47760   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:12:57,819-Speed 3010.93 samples/sec   Loss 11.8594   LearningRate 0.0652   Epoch: 3   Global Step: 47770   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:01,195-Speed 3033.90 samples/sec   Loss 12.0085   LearningRate 0.0652   Epoch: 3   Global Step: 47780   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:04,615-Speed 2994.84 samples/sec   Loss 12.0599   LearningRate 0.0652   Epoch: 3   Global Step: 47790   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:07,964-Speed 3059.25 samples/sec   Loss 11.9746   LearningRate 0.0652   Epoch: 3   Global Step: 47800   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:11,318-Speed 3053.75 samples/sec   Loss 12.1198   LearningRate 0.0652   Epoch: 3   Global Step: 47810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:14,628-Speed 3093.86 samples/sec   Loss 12.0721   LearningRate 0.0652   Epoch: 3   Global Step: 47820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:18,013-Speed 3026.38 samples/sec   Loss 12.0804   LearningRate 0.0652   Epoch: 3   Global Step: 47830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:21,353-Speed 3067.25 samples/sec   Loss 11.9830   LearningRate 0.0652   Epoch: 3   Global Step: 47840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:24,636-Speed 3119.84 samples/sec   Loss 12.0987   LearningRate 0.0652   Epoch: 3   Global Step: 47850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:27,943-Speed 3097.24 samples/sec   Loss 12.1273   LearningRate 0.0652   Epoch: 3   Global Step: 47860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:31,241-Speed 3105.31 samples/sec   Loss 12.0696   LearningRate 0.0652   Epoch: 3   Global Step: 47870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:34,559-Speed 3087.32 samples/sec   Loss 12.0750   LearningRate 0.0652   Epoch: 3   Global Step: 47880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:37,866-Speed 3098.22 samples/sec   Loss 12.0281   LearningRate 0.0652   Epoch: 3   Global Step: 47890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:41,140-Speed 3128.39 samples/sec   Loss 11.7802   LearningRate 0.0652   Epoch: 3   Global Step: 47900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:44,509-Speed 3040.80 samples/sec   Loss 11.9988   LearningRate 0.0651   Epoch: 3   Global Step: 47910   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:47,808-Speed 3105.13 samples/sec   Loss 11.9639   LearningRate 0.0651   Epoch: 3   Global Step: 47920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:51,116-Speed 3096.43 samples/sec   Loss 11.9876   LearningRate 0.0651   Epoch: 3   Global Step: 47930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:54,478-Speed 3046.91 samples/sec   Loss 12.0508   LearningRate 0.0651   Epoch: 3   Global Step: 47940   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:13:57,909-Speed 2984.73 samples/sec   Loss 11.9857   LearningRate 0.0651   Epoch: 3   Global Step: 47950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:14:01,265-Speed 3052.74 samples/sec   Loss 12.1019   LearningRate 0.0651   Epoch: 3   Global Step: 47960   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:14:04,588-Speed 3082.22 samples/sec   Loss 12.0955   LearningRate 0.0651   Epoch: 3   Global Step: 47970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:14:07,920-Speed 3074.77 samples/sec   Loss 12.1171   LearningRate 0.0651   Epoch: 3   Global Step: 47980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:14:11,215-Speed 3108.49 samples/sec   Loss 12.1011   LearningRate 0.0651   Epoch: 3   Global Step: 47990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:14:14,567-Speed 3055.45 samples/sec   Loss 11.9936   LearningRate 0.0651   Epoch: 3   Global Step: 48000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:14:17,914-Speed 3060.67 samples/sec   Loss 12.0503   LearningRate 0.0651   Epoch: 3   Global Step: 48010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:14:21,282-Speed 3041.28 samples/sec   Loss 12.0815   LearningRate 0.0651   Epoch: 3   Global Step: 48020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:14:24,573-Speed 3112.94 samples/sec   Loss 12.0213   LearningRate 0.0651   Epoch: 3   Global Step: 48030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:14:27,909-Speed 3069.93 samples/sec   Loss 11.9802   LearningRate 0.0651   Epoch: 3   Global Step: 48040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:14:31,204-Speed 3109.02 samples/sec   Loss 12.0263   LearningRate 0.0651   Epoch: 3   Global Step: 48050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:14:34,521-Speed 3088.05 samples/sec   Loss 12.0217   LearningRate 0.0651   Epoch: 3   Global Step: 48060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:14:37,839-Speed 3086.76 samples/sec   Loss 11.9985   LearningRate 0.0650   Epoch: 3   Global Step: 48070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:14:41,142-Speed 3100.87 samples/sec   Loss 12.0295   LearningRate 0.0650   Epoch: 3   Global Step: 48080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:14:44,488-Speed 3061.64 samples/sec   Loss 11.9656   LearningRate 0.0650   Epoch: 3   Global Step: 48090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:14:47,780-Speed 3111.49 samples/sec   Loss 12.0758   LearningRate 0.0650   Epoch: 3   Global Step: 48100   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:14:51,071-Speed 3112.16 samples/sec   Loss 11.9947   LearningRate 0.0650   Epoch: 3   Global Step: 48110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:14:54,394-Speed 3083.02 samples/sec   Loss 12.1112   LearningRate 0.0650   Epoch: 3   Global Step: 48120   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:14:57,805-Speed 3003.17 samples/sec   Loss 12.0732   LearningRate 0.0650   Epoch: 3   Global Step: 48130   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:01,157-Speed 3055.21 samples/sec   Loss 11.8938   LearningRate 0.0650   Epoch: 3   Global Step: 48140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:04,502-Speed 3062.68 samples/sec   Loss 12.0383   LearningRate 0.0650   Epoch: 3   Global Step: 48150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:07,837-Speed 3071.59 samples/sec   Loss 12.0893   LearningRate 0.0650   Epoch: 3   Global Step: 48160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:11,163-Speed 3079.41 samples/sec   Loss 12.0812   LearningRate 0.0650   Epoch: 3   Global Step: 48170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:14,516-Speed 3054.93 samples/sec   Loss 11.9647   LearningRate 0.0650   Epoch: 3   Global Step: 48180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:17,873-Speed 3050.41 samples/sec   Loss 12.0607   LearningRate 0.0650   Epoch: 3   Global Step: 48190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:21,308-Speed 2982.43 samples/sec   Loss 11.9657   LearningRate 0.0650   Epoch: 3   Global Step: 48200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:24,586-Speed 3124.43 samples/sec   Loss 11.8875   LearningRate 0.0650   Epoch: 3   Global Step: 48210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:27,880-Speed 3109.63 samples/sec   Loss 11.8199   LearningRate 0.0649   Epoch: 3   Global Step: 48220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:31,282-Speed 3011.34 samples/sec   Loss 11.9958   LearningRate 0.0649   Epoch: 3   Global Step: 48230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:34,587-Speed 3098.65 samples/sec   Loss 11.8962   LearningRate 0.0649   Epoch: 3   Global Step: 48240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:37,911-Speed 3082.23 samples/sec   Loss 11.8996   LearningRate 0.0649   Epoch: 3   Global Step: 48250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:41,277-Speed 3042.89 samples/sec   Loss 12.1352   LearningRate 0.0649   Epoch: 3   Global Step: 48260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:44,581-Speed 3100.21 samples/sec   Loss 12.0876   LearningRate 0.0649   Epoch: 3   Global Step: 48270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:15:47,861-Speed 3123.44 samples/sec   Loss 11.9613   LearningRate 0.0649   Epoch: 3   Global Step: 48280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:15:51,165-Speed 3100.26 samples/sec   Loss 12.0102   LearningRate 0.0649   Epoch: 3   Global Step: 48290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:15:54,512-Speed 3060.48 samples/sec   Loss 11.9436   LearningRate 0.0649   Epoch: 3   Global Step: 48300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:15:57,858-Speed 3061.76 samples/sec   Loss 11.9274   LearningRate 0.0649   Epoch: 3   Global Step: 48310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:01,241-Speed 3027.76 samples/sec   Loss 11.8980   LearningRate 0.0649   Epoch: 3   Global Step: 48320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:04,593-Speed 3055.96 samples/sec   Loss 12.1108   LearningRate 0.0649   Epoch: 3   Global Step: 48330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:07,947-Speed 3053.67 samples/sec   Loss 11.9890   LearningRate 0.0649   Epoch: 3   Global Step: 48340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:11,267-Speed 3084.79 samples/sec   Loss 12.1710   LearningRate 0.0649   Epoch: 3   Global Step: 48350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:14,612-Speed 3062.61 samples/sec   Loss 12.1199   LearningRate 0.0649   Epoch: 3   Global Step: 48360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:17,987-Speed 3034.99 samples/sec   Loss 11.9555   LearningRate 0.0648   Epoch: 3   Global Step: 48370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:21,376-Speed 3021.89 samples/sec   Loss 11.9520   LearningRate 0.0648   Epoch: 3   Global Step: 48380   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:16:24,763-Speed 3024.10 samples/sec   Loss 12.0338   LearningRate 0.0648   Epoch: 3   Global Step: 48390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:16:28,126-Speed 3046.46 samples/sec   Loss 11.9849   LearningRate 0.0648   Epoch: 3   Global Step: 48400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:31,448-Speed 3083.45 samples/sec   Loss 12.2002   LearningRate 0.0648   Epoch: 3   Global Step: 48410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:34,766-Speed 3086.85 samples/sec   Loss 11.9756   LearningRate 0.0648   Epoch: 3   Global Step: 48420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:38,077-Speed 3094.00 samples/sec   Loss 11.9112   LearningRate 0.0648   Epoch: 3   Global Step: 48430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:41,421-Speed 3062.69 samples/sec   Loss 12.0586   LearningRate 0.0648   Epoch: 3   Global Step: 48440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:44,725-Speed 3101.05 samples/sec   Loss 12.0921   LearningRate 0.0648   Epoch: 3   Global Step: 48450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:48,132-Speed 3006.80 samples/sec   Loss 11.9494   LearningRate 0.0648   Epoch: 3   Global Step: 48460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:51,464-Speed 3073.83 samples/sec   Loss 12.0282   LearningRate 0.0648   Epoch: 3   Global Step: 48470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:54,914-Speed 2968.59 samples/sec   Loss 12.0514   LearningRate 0.0648   Epoch: 3   Global Step: 48480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:16:58,330-Speed 2998.92 samples/sec   Loss 12.0671   LearningRate 0.0648   Epoch: 3   Global Step: 48490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:17:01,709-Speed 3031.08 samples/sec   Loss 12.1015   LearningRate 0.0648   Epoch: 3   Global Step: 48500   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:17:05,033-Speed 3082.24 samples/sec   Loss 11.8893   LearningRate 0.0648   Epoch: 3   Global Step: 48510   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:17:08,437-Speed 3008.59 samples/sec   Loss 12.0235   LearningRate 0.0648   Epoch: 3   Global Step: 48520   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:17:11,753-Speed 3089.27 samples/sec   Loss 11.9666   LearningRate 0.0647   Epoch: 3   Global Step: 48530   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:17:15,095-Speed 3065.01 samples/sec   Loss 11.8148   LearningRate 0.0647   Epoch: 3   Global Step: 48540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:17:18,515-Speed 2995.16 samples/sec   Loss 12.0942   LearningRate 0.0647   Epoch: 3   Global Step: 48550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:17:21,891-Speed 3034.22 samples/sec   Loss 11.8147   LearningRate 0.0647   Epoch: 3   Global Step: 48560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:17:25,211-Speed 3085.41 samples/sec   Loss 11.8972   LearningRate 0.0647   Epoch: 3   Global Step: 48570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:17:28,538-Speed 3078.31 samples/sec   Loss 11.8683   LearningRate 0.0647   Epoch: 3   Global Step: 48580   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:17:31,880-Speed 3064.44 samples/sec   Loss 11.9260   LearningRate 0.0647   Epoch: 3   Global Step: 48590   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:17:35,280-Speed 3013.78 samples/sec   Loss 12.0573   LearningRate 0.0647   Epoch: 3   Global Step: 48600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:17:38,646-Speed 3043.39 samples/sec   Loss 11.9277   LearningRate 0.0647   Epoch: 3   Global Step: 48610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:17:41,995-Speed 3058.54 samples/sec   Loss 11.9320   LearningRate 0.0647   Epoch: 3   Global Step: 48620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:17:45,357-Speed 3046.34 samples/sec   Loss 12.0262   LearningRate 0.0647   Epoch: 3   Global Step: 48630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:17:48,723-Speed 3042.75 samples/sec   Loss 11.9681   LearningRate 0.0647   Epoch: 3   Global Step: 48640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:17:52,075-Speed 3056.23 samples/sec   Loss 11.9163   LearningRate 0.0647   Epoch: 3   Global Step: 48650   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:17:55,454-Speed 3031.38 samples/sec   Loss 12.0751   LearningRate 0.0647   Epoch: 3   Global Step: 48660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:17:58,764-Speed 3094.37 samples/sec   Loss 11.9658   LearningRate 0.0647   Epoch: 3   Global Step: 48670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:18:02,081-Speed 3088.34 samples/sec   Loss 11.8607   LearningRate 0.0646   Epoch: 3   Global Step: 48680   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:18:05,447-Speed 3042.78 samples/sec   Loss 11.9155   LearningRate 0.0646   Epoch: 3   Global Step: 48690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:18:08,809-Speed 3047.00 samples/sec   Loss 11.8850   LearningRate 0.0646   Epoch: 3   Global Step: 48700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:18:12,125-Speed 3088.96 samples/sec   Loss 11.9892   LearningRate 0.0646   Epoch: 3   Global Step: 48710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:18:15,446-Speed 3084.83 samples/sec   Loss 11.8988   LearningRate 0.0646   Epoch: 3   Global Step: 48720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:18:18,857-Speed 3002.52 samples/sec   Loss 11.9154   LearningRate 0.0646   Epoch: 3   Global Step: 48730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:18:22,272-Speed 2999.63 samples/sec   Loss 12.0619   LearningRate 0.0646   Epoch: 3   Global Step: 48740   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:18:25,632-Speed 3048.84 samples/sec   Loss 11.7943   LearningRate 0.0646   Epoch: 3   Global Step: 48750   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:18:28,939-Speed 3097.33 samples/sec   Loss 11.9614   LearningRate 0.0646   Epoch: 3   Global Step: 48760   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:18:32,239-Speed 3103.78 samples/sec   Loss 12.0207   LearningRate 0.0646   Epoch: 3   Global Step: 48770   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:18:35,570-Speed 3075.00 samples/sec   Loss 11.7383   LearningRate 0.0646   Epoch: 3   Global Step: 48780   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:18:38,911-Speed 3066.30 samples/sec   Loss 11.9627   LearningRate 0.0646   Epoch: 3   Global Step: 48790   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:18:42,245-Speed 3072.02 samples/sec   Loss 12.0579   LearningRate 0.0646   Epoch: 3   Global Step: 48800   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:18:45,538-Speed 3110.41 samples/sec   Loss 11.9105   LearningRate 0.0646   Epoch: 3   Global Step: 48810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:18:48,866-Speed 3078.38 samples/sec   Loss 11.8778   LearningRate 0.0646   Epoch: 3   Global Step: 48820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:18:52,259-Speed 3018.03 samples/sec   Loss 11.9723   LearningRate 0.0646   Epoch: 3   Global Step: 48830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:18:55,650-Speed 3021.75 samples/sec   Loss 11.9565   LearningRate 0.0645   Epoch: 3   Global Step: 48840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:18:59,094-Speed 2974.15 samples/sec   Loss 11.9586   LearningRate 0.0645   Epoch: 3   Global Step: 48850   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:19:02,487-Speed 3019.02 samples/sec   Loss 11.9363   LearningRate 0.0645   Epoch: 3   Global Step: 48860   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:19:05,814-Speed 3079.08 samples/sec   Loss 11.9814   LearningRate 0.0645   Epoch: 3   Global Step: 48870   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:19:09,118-Speed 3099.99 samples/sec   Loss 12.0615   LearningRate 0.0645   Epoch: 3   Global Step: 48880   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:19:12,491-Speed 3036.82 samples/sec   Loss 12.0541   LearningRate 0.0645   Epoch: 3   Global Step: 48890   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:19:15,836-Speed 3061.54 samples/sec   Loss 12.0175   LearningRate 0.0645   Epoch: 3   Global Step: 48900   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:19:19,201-Speed 3043.90 samples/sec   Loss 11.9344   LearningRate 0.0645   Epoch: 3   Global Step: 48910   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:19:22,633-Speed 2984.76 samples/sec   Loss 12.0083   LearningRate 0.0645   Epoch: 3   Global Step: 48920   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:19:25,952-Speed 3086.10 samples/sec   Loss 11.8750   LearningRate 0.0645   Epoch: 3   Global Step: 48930   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:19:29,284-Speed 3073.95 samples/sec   Loss 12.1422   LearningRate 0.0645   Epoch: 3   Global Step: 48940   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 06:19:32,652-Speed 3041.66 samples/sec   Loss 11.9305   LearningRate 0.0645   Epoch: 3   Global Step: 48950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:19:35,981-Speed 3077.22 samples/sec   Loss 11.9345   LearningRate 0.0645   Epoch: 3   Global Step: 48960   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:19:39,357-Speed 3033.14 samples/sec   Loss 11.8726   LearningRate 0.0645   Epoch: 3   Global Step: 48970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:19:42,718-Speed 3048.27 samples/sec   Loss 11.9255   LearningRate 0.0645   Epoch: 3   Global Step: 48980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:19:46,140-Speed 2993.21 samples/sec   Loss 11.8440   LearningRate 0.0644   Epoch: 3   Global Step: 48990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:19:49,450-Speed 3094.21 samples/sec   Loss 11.9080   LearningRate 0.0644   Epoch: 3   Global Step: 49000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:19:52,820-Speed 3039.64 samples/sec   Loss 11.8940   LearningRate 0.0644   Epoch: 3   Global Step: 49010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:19:56,168-Speed 3059.66 samples/sec   Loss 11.9700   LearningRate 0.0644   Epoch: 3   Global Step: 49020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:19:59,548-Speed 3030.22 samples/sec   Loss 11.9914   LearningRate 0.0644   Epoch: 3   Global Step: 49030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:20:02,919-Speed 3038.53 samples/sec   Loss 12.1211   LearningRate 0.0644   Epoch: 3   Global Step: 49040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:20:06,321-Speed 3010.99 samples/sec   Loss 11.8753   LearningRate 0.0644   Epoch: 3   Global Step: 49050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:20:09,752-Speed 2985.65 samples/sec   Loss 11.9878   LearningRate 0.0644   Epoch: 3   Global Step: 49060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:20:13,053-Speed 3102.92 samples/sec   Loss 11.9194   LearningRate 0.0644   Epoch: 3   Global Step: 49070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:20:16,354-Speed 3103.12 samples/sec   Loss 11.7589   LearningRate 0.0644   Epoch: 3   Global Step: 49080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:20:19,739-Speed 3026.31 samples/sec   Loss 12.0237   LearningRate 0.0644   Epoch: 3   Global Step: 49090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:20:23,063-Speed 3081.35 samples/sec   Loss 11.9556   LearningRate 0.0644   Epoch: 3   Global Step: 49100   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:20:26,458-Speed 3016.93 samples/sec   Loss 11.9139   LearningRate 0.0644   Epoch: 3   Global Step: 49110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:20:29,811-Speed 3054.85 samples/sec   Loss 11.9722   LearningRate 0.0644   Epoch: 3   Global Step: 49120   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:20:33,198-Speed 3024.68 samples/sec   Loss 12.0197   LearningRate 0.0644   Epoch: 3   Global Step: 49130   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:20:36,596-Speed 3014.38 samples/sec   Loss 11.8811   LearningRate 0.0644   Epoch: 3   Global Step: 49140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:20:39,911-Speed 3089.84 samples/sec   Loss 12.0120   LearningRate 0.0643   Epoch: 3   Global Step: 49150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:20:43,238-Speed 3078.84 samples/sec   Loss 11.9888   LearningRate 0.0643   Epoch: 3   Global Step: 49160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:20:46,629-Speed 3021.29 samples/sec   Loss 12.0609   LearningRate 0.0643   Epoch: 3   Global Step: 49170   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-04-27 06:20:49,985-Speed 3051.85 samples/sec   Loss 11.9287   LearningRate 0.0643   Epoch: 3   Global Step: 49180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:20:53,356-Speed 3038.13 samples/sec   Loss 12.0815   LearningRate 0.0643   Epoch: 3   Global Step: 49190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:20:56,677-Speed 3084.60 samples/sec   Loss 11.8564   LearningRate 0.0643   Epoch: 3   Global Step: 49200   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:21:00,069-Speed 3020.36 samples/sec   Loss 12.0223   LearningRate 0.0643   Epoch: 3   Global Step: 49210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:21:03,407-Speed 3067.87 samples/sec   Loss 11.9772   LearningRate 0.0643   Epoch: 3   Global Step: 49220   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:21:06,767-Speed 3049.34 samples/sec   Loss 11.8801   LearningRate 0.0643   Epoch: 3   Global Step: 49230   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:21:10,124-Speed 3050.87 samples/sec   Loss 11.9108   LearningRate 0.0643   Epoch: 3   Global Step: 49240   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:21:13,500-Speed 3034.49 samples/sec   Loss 11.9317   LearningRate 0.0643   Epoch: 3   Global Step: 49250   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:21:16,850-Speed 3057.31 samples/sec   Loss 12.1295   LearningRate 0.0643   Epoch: 3   Global Step: 49260   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:21:20,191-Speed 3066.14 samples/sec   Loss 11.9112   LearningRate 0.0643   Epoch: 3   Global Step: 49270   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:21:23,520-Speed 3076.22 samples/sec   Loss 12.0057   LearningRate 0.0643   Epoch: 3   Global Step: 49280   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:21:26,852-Speed 3074.69 samples/sec   Loss 11.8881   LearningRate 0.0643   Epoch: 3   Global Step: 49290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:21:30,197-Speed 3061.54 samples/sec   Loss 11.9553   LearningRate 0.0642   Epoch: 3   Global Step: 49300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:21:33,594-Speed 3015.11 samples/sec   Loss 12.0822   LearningRate 0.0642   Epoch: 3   Global Step: 49310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:21:36,953-Speed 3049.89 samples/sec   Loss 11.9296   LearningRate 0.0642   Epoch: 3   Global Step: 49320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:21:40,283-Speed 3075.99 samples/sec   Loss 12.0479   LearningRate 0.0642   Epoch: 3   Global Step: 49330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:21:43,690-Speed 3006.63 samples/sec   Loss 11.9228   LearningRate 0.0642   Epoch: 3   Global Step: 49340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:21:47,051-Speed 3047.55 samples/sec   Loss 11.9105   LearningRate 0.0642   Epoch: 3   Global Step: 49350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:21:50,447-Speed 3016.32 samples/sec   Loss 11.8460   LearningRate 0.0642   Epoch: 3   Global Step: 49360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:21:53,778-Speed 3074.41 samples/sec   Loss 12.0231   LearningRate 0.0642   Epoch: 3   Global Step: 49370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:21:57,153-Speed 3035.16 samples/sec   Loss 11.8866   LearningRate 0.0642   Epoch: 3   Global Step: 49380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:22:00,557-Speed 3009.07 samples/sec   Loss 11.9844   LearningRate 0.0642   Epoch: 3   Global Step: 49390   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:22:03,867-Speed 3094.94 samples/sec   Loss 11.7919   LearningRate 0.0642   Epoch: 3   Global Step: 49400   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:22:07,229-Speed 3046.61 samples/sec   Loss 11.8777   LearningRate 0.0642   Epoch: 3   Global Step: 49410   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:22:10,545-Speed 3089.01 samples/sec   Loss 11.7681   LearningRate 0.0642   Epoch: 3   Global Step: 49420   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:22:13,947-Speed 3010.62 samples/sec   Loss 11.8379   LearningRate 0.0642   Epoch: 3   Global Step: 49430   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:22:17,259-Speed 3092.95 samples/sec   Loss 11.7795   LearningRate 0.0642   Epoch: 3   Global Step: 49440   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:22:20,609-Speed 3057.24 samples/sec   Loss 11.9648   LearningRate 0.0642   Epoch: 3   Global Step: 49450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:22:24,034-Speed 2990.80 samples/sec   Loss 11.8449   LearningRate 0.0641   Epoch: 3   Global Step: 49460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:22:27,443-Speed 3004.38 samples/sec   Loss 11.8452   LearningRate 0.0641   Epoch: 3   Global Step: 49470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:22:30,757-Speed 3091.88 samples/sec   Loss 11.9846   LearningRate 0.0641   Epoch: 3   Global Step: 49480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:22:34,135-Speed 3031.68 samples/sec   Loss 11.9669   LearningRate 0.0641   Epoch: 3   Global Step: 49490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:22:37,562-Speed 2988.78 samples/sec   Loss 11.8749   LearningRate 0.0641   Epoch: 3   Global Step: 49500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:22:40,904-Speed 3065.34 samples/sec   Loss 11.9091   LearningRate 0.0641   Epoch: 3   Global Step: 49510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:22:44,237-Speed 3073.35 samples/sec   Loss 11.9416   LearningRate 0.0641   Epoch: 3   Global Step: 49520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:22:47,610-Speed 3036.74 samples/sec   Loss 11.8455   LearningRate 0.0641   Epoch: 3   Global Step: 49530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:22:50,951-Speed 3066.15 samples/sec   Loss 11.8635   LearningRate 0.0641   Epoch: 3   Global Step: 49540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:22:54,389-Speed 2979.02 samples/sec   Loss 11.9360   LearningRate 0.0641   Epoch: 3   Global Step: 49550   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:22:57,765-Speed 3034.53 samples/sec   Loss 12.0381   LearningRate 0.0641   Epoch: 3   Global Step: 49560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:23:01,184-Speed 2995.96 samples/sec   Loss 11.8689   LearningRate 0.0641   Epoch: 3   Global Step: 49570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:23:04,508-Speed 3080.51 samples/sec   Loss 11.9864   LearningRate 0.0641   Epoch: 3   Global Step: 49580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:23:07,790-Speed 3120.98 samples/sec   Loss 11.9076   LearningRate 0.0641   Epoch: 3   Global Step: 49590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:23:11,138-Speed 3060.38 samples/sec   Loss 11.9142   LearningRate 0.0641   Epoch: 3   Global Step: 49600   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:23:14,584-Speed 2972.15 samples/sec   Loss 12.0358   LearningRate 0.0640   Epoch: 3   Global Step: 49610   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:23:18,009-Speed 2991.27 samples/sec   Loss 12.0725   LearningRate 0.0640   Epoch: 3   Global Step: 49620   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:23:21,456-Speed 2971.05 samples/sec   Loss 12.0775   LearningRate 0.0640   Epoch: 3   Global Step: 49630   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:23:24,881-Speed 2990.79 samples/sec   Loss 11.8763   LearningRate 0.0640   Epoch: 3   Global Step: 49640   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:23:28,220-Speed 3068.35 samples/sec   Loss 11.8891   LearningRate 0.0640   Epoch: 3   Global Step: 49650   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:23:31,551-Speed 3075.48 samples/sec   Loss 11.8944   LearningRate 0.0640   Epoch: 3   Global Step: 49660   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:23:34,862-Speed 3093.67 samples/sec   Loss 11.8110   LearningRate 0.0640   Epoch: 3   Global Step: 49670   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:23:38,424-Speed 2875.35 samples/sec   Loss 12.0148   LearningRate 0.0640   Epoch: 3   Global Step: 49680   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:24:10,150-Speed 322.78 samples/sec   Loss 11.0676   LearningRate 0.0640   Epoch: 4   Global Step: 49690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:24:13,718-Speed 2870.86 samples/sec   Loss 10.5282   LearningRate 0.0640   Epoch: 4   Global Step: 49700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:24:17,004-Speed 3117.28 samples/sec   Loss 10.4182   LearningRate 0.0640   Epoch: 4   Global Step: 49710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:24:20,306-Speed 3102.77 samples/sec   Loss 10.3401   LearningRate 0.0640   Epoch: 4   Global Step: 49720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:24:23,550-Speed 3157.65 samples/sec   Loss 10.3440   LearningRate 0.0640   Epoch: 4   Global Step: 49730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:24:26,927-Speed 3033.29 samples/sec   Loss 10.4689   LearningRate 0.0640   Epoch: 4   Global Step: 49740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:24:30,204-Speed 3125.20 samples/sec   Loss 10.4103   LearningRate 0.0640   Epoch: 4   Global Step: 49750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:24:33,535-Speed 3075.88 samples/sec   Loss 10.3721   LearningRate 0.0640   Epoch: 4   Global Step: 49760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:24:36,838-Speed 3100.98 samples/sec   Loss 10.4029   LearningRate 0.0639   Epoch: 4   Global Step: 49770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:24:40,190-Speed 3056.54 samples/sec   Loss 10.5132   LearningRate 0.0639   Epoch: 4   Global Step: 49780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:24:43,586-Speed 3016.13 samples/sec   Loss 10.3815   LearningRate 0.0639   Epoch: 4   Global Step: 49790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:24:46,881-Speed 3108.63 samples/sec   Loss 10.4688   LearningRate 0.0639   Epoch: 4   Global Step: 49800   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:24:50,222-Speed 3066.66 samples/sec   Loss 10.6229   LearningRate 0.0639   Epoch: 4   Global Step: 49810   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:24:53,556-Speed 3071.83 samples/sec   Loss 10.4594   LearningRate 0.0639   Epoch: 4   Global Step: 49820   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:24:56,855-Speed 3105.29 samples/sec   Loss 10.4788   LearningRate 0.0639   Epoch: 4   Global Step: 49830   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:25:00,172-Speed 3088.30 samples/sec   Loss 10.6266   LearningRate 0.0639   Epoch: 4   Global Step: 49840   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:25:03,449-Speed 3125.85 samples/sec   Loss 10.4870   LearningRate 0.0639   Epoch: 4   Global Step: 49850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:25:06,815-Speed 3043.24 samples/sec   Loss 10.5963   LearningRate 0.0639   Epoch: 4   Global Step: 49860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:25:10,131-Speed 3088.94 samples/sec   Loss 10.5054   LearningRate 0.0639   Epoch: 4   Global Step: 49870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:25:13,406-Speed 3127.85 samples/sec   Loss 10.6787   LearningRate 0.0639   Epoch: 4   Global Step: 49880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:25:16,918-Speed 2916.43 samples/sec   Loss 10.5208   LearningRate 0.0639   Epoch: 4   Global Step: 49890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:25:20,277-Speed 3048.81 samples/sec   Loss 10.6117   LearningRate 0.0639   Epoch: 4   Global Step: 49900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:25:23,605-Speed 3078.77 samples/sec   Loss 10.4589   LearningRate 0.0639   Epoch: 4   Global Step: 49910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:25:27,077-Speed 2950.06 samples/sec   Loss 10.3768   LearningRate 0.0638   Epoch: 4   Global Step: 49920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:25:30,393-Speed 3089.17 samples/sec   Loss 10.3702   LearningRate 0.0638   Epoch: 4   Global Step: 49930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:25:33,715-Speed 3083.57 samples/sec   Loss 10.5707   LearningRate 0.0638   Epoch: 4   Global Step: 49940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:25:37,018-Speed 3101.05 samples/sec   Loss 10.3478   LearningRate 0.0638   Epoch: 4   Global Step: 49950   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:25:40,419-Speed 3011.58 samples/sec   Loss 10.4418   LearningRate 0.0638   Epoch: 4   Global Step: 49960   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:25:43,765-Speed 3061.68 samples/sec   Loss 10.5226   LearningRate 0.0638   Epoch: 4   Global Step: 49970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:25:47,093-Speed 3078.20 samples/sec   Loss 10.5069   LearningRate 0.0638   Epoch: 4   Global Step: 49980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:25:50,446-Speed 3054.20 samples/sec   Loss 10.6944   LearningRate 0.0638   Epoch: 4   Global Step: 49990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:25:53,837-Speed 3021.12 samples/sec   Loss 10.6058   LearningRate 0.0638   Epoch: 4   Global Step: 50000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:25:57,252-Speed 2999.59 samples/sec   Loss 10.5323   LearningRate 0.0638   Epoch: 4   Global Step: 50010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:26:00,577-Speed 3080.61 samples/sec   Loss 10.6741   LearningRate 0.0638   Epoch: 4   Global Step: 50020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:26:03,967-Speed 3022.62 samples/sec   Loss 10.6120   LearningRate 0.0638   Epoch: 4   Global Step: 50030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:26:07,328-Speed 3047.27 samples/sec   Loss 10.6727   LearningRate 0.0638   Epoch: 4   Global Step: 50040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:26:10,634-Speed 3098.39 samples/sec   Loss 10.7496   LearningRate 0.0638   Epoch: 4   Global Step: 50050   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:26:13,924-Speed 3113.62 samples/sec   Loss 10.6979   LearningRate 0.0638   Epoch: 4   Global Step: 50060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 06:26:17,198-Speed 3127.92 samples/sec   Loss 10.6649   LearningRate 0.0638   Epoch: 4   Global Step: 50070   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:26:20,502-Speed 3101.30 samples/sec   Loss 10.8206   LearningRate 0.0637   Epoch: 4   Global Step: 50080   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:26:23,868-Speed 3042.77 samples/sec   Loss 10.6953   LearningRate 0.0637   Epoch: 4   Global Step: 50090   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:26:27,259-Speed 3020.70 samples/sec   Loss 10.7809   LearningRate 0.0637   Epoch: 4   Global Step: 50100   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:26:30,596-Speed 3069.50 samples/sec   Loss 10.6507   LearningRate 0.0637   Epoch: 4   Global Step: 50110   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:26:33,975-Speed 3031.08 samples/sec   Loss 10.6695   LearningRate 0.0637   Epoch: 4   Global Step: 50120   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:26:37,314-Speed 3068.17 samples/sec   Loss 10.4377   LearningRate 0.0637   Epoch: 4   Global Step: 50130   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 06:26:40,653-Speed 3067.66 samples/sec   Loss 10.6130   LearningRate 0.0637   Epoch: 4   Global Step: 50140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:26:44,080-Speed 2988.57 samples/sec   Loss 10.7870   LearningRate 0.0637   Epoch: 4   Global Step: 50150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:26:47,355-Speed 3127.98 samples/sec   Loss 10.6142   LearningRate 0.0637   Epoch: 4   Global Step: 50160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:26:50,693-Speed 3068.54 samples/sec   Loss 10.7926   LearningRate 0.0637   Epoch: 4   Global Step: 50170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:26:54,035-Speed 3064.44 samples/sec   Loss 10.7994   LearningRate 0.0637   Epoch: 4   Global Step: 50180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:26:57,384-Speed 3058.85 samples/sec   Loss 10.8156   LearningRate 0.0637   Epoch: 4   Global Step: 50190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:27:00,740-Speed 3052.82 samples/sec   Loss 10.8625   LearningRate 0.0637   Epoch: 4   Global Step: 50200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:27:04,101-Speed 3047.15 samples/sec   Loss 10.7036   LearningRate 0.0637   Epoch: 4   Global Step: 50210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:27:07,482-Speed 3030.49 samples/sec   Loss 10.6692   LearningRate 0.0637   Epoch: 4   Global Step: 50220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:27:10,747-Speed 3137.51 samples/sec   Loss 10.7677   LearningRate 0.0636   Epoch: 4   Global Step: 50230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:27:14,110-Speed 3046.04 samples/sec   Loss 10.8734   LearningRate 0.0636   Epoch: 4   Global Step: 50240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:27:17,432-Speed 3082.99 samples/sec   Loss 10.7799   LearningRate 0.0636   Epoch: 4   Global Step: 50250   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:27:20,740-Speed 3096.53 samples/sec   Loss 10.8928   LearningRate 0.0636   Epoch: 4   Global Step: 50260   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:27:24,001-Speed 3141.55 samples/sec   Loss 10.8116   LearningRate 0.0636   Epoch: 4   Global Step: 50270   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:27:27,361-Speed 3048.83 samples/sec   Loss 10.8954   LearningRate 0.0636   Epoch: 4   Global Step: 50280   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:27:30,742-Speed 3029.49 samples/sec   Loss 10.9963   LearningRate 0.0636   Epoch: 4   Global Step: 50290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:27:34,043-Speed 3103.65 samples/sec   Loss 10.7886   LearningRate 0.0636   Epoch: 4   Global Step: 50300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:27:37,344-Speed 3103.29 samples/sec   Loss 10.7822   LearningRate 0.0636   Epoch: 4   Global Step: 50310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:27:40,724-Speed 3030.22 samples/sec   Loss 10.8244   LearningRate 0.0636   Epoch: 4   Global Step: 50320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:27:44,054-Speed 3076.41 samples/sec   Loss 10.9105   LearningRate 0.0636   Epoch: 4   Global Step: 50330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:27:47,389-Speed 3070.63 samples/sec   Loss 10.8206   LearningRate 0.0636   Epoch: 4   Global Step: 50340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:27:50,734-Speed 3062.57 samples/sec   Loss 11.0732   LearningRate 0.0636   Epoch: 4   Global Step: 50350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:27:54,055-Speed 3084.17 samples/sec   Loss 10.6925   LearningRate 0.0636   Epoch: 4   Global Step: 50360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:27:57,446-Speed 3020.78 samples/sec   Loss 10.9035   LearningRate 0.0636   Epoch: 4   Global Step: 50370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:00,875-Speed 2986.94 samples/sec   Loss 10.8431   LearningRate 0.0636   Epoch: 4   Global Step: 50380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:04,240-Speed 3044.26 samples/sec   Loss 10.9426   LearningRate 0.0635   Epoch: 4   Global Step: 50390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:07,580-Speed 3066.21 samples/sec   Loss 10.9983   LearningRate 0.0635   Epoch: 4   Global Step: 50400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:10,879-Speed 3105.19 samples/sec   Loss 10.9250   LearningRate 0.0635   Epoch: 4   Global Step: 50410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:14,293-Speed 3000.57 samples/sec   Loss 10.8605   LearningRate 0.0635   Epoch: 4   Global Step: 50420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:17,741-Speed 2970.54 samples/sec   Loss 10.8231   LearningRate 0.0635   Epoch: 4   Global Step: 50430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:21,169-Speed 2988.41 samples/sec   Loss 10.9182   LearningRate 0.0635   Epoch: 4   Global Step: 50440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:24,535-Speed 3043.09 samples/sec   Loss 10.8506   LearningRate 0.0635   Epoch: 4   Global Step: 50450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:27,960-Speed 2990.62 samples/sec   Loss 10.9320   LearningRate 0.0635   Epoch: 4   Global Step: 50460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:31,279-Speed 3085.86 samples/sec   Loss 11.0188   LearningRate 0.0635   Epoch: 4   Global Step: 50470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:34,676-Speed 3016.08 samples/sec   Loss 11.0753   LearningRate 0.0635   Epoch: 4   Global Step: 50480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:38,027-Speed 3056.64 samples/sec   Loss 11.1873   LearningRate 0.0635   Epoch: 4   Global Step: 50490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:41,355-Speed 3078.32 samples/sec   Loss 11.0564   LearningRate 0.0635   Epoch: 4   Global Step: 50500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:44,686-Speed 3075.45 samples/sec   Loss 11.0815   LearningRate 0.0635   Epoch: 4   Global Step: 50510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:48,071-Speed 3025.46 samples/sec   Loss 10.9153   LearningRate 0.0635   Epoch: 4   Global Step: 50520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:51,416-Speed 3063.05 samples/sec   Loss 11.0282   LearningRate 0.0635   Epoch: 4   Global Step: 50530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:54,712-Speed 3107.07 samples/sec   Loss 11.0750   LearningRate 0.0634   Epoch: 4   Global Step: 50540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:28:58,046-Speed 3072.85 samples/sec   Loss 11.0528   LearningRate 0.0634   Epoch: 4   Global Step: 50550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:29:01,376-Speed 3075.75 samples/sec   Loss 11.0616   LearningRate 0.0634   Epoch: 4   Global Step: 50560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:29:04,711-Speed 3072.43 samples/sec   Loss 11.0440   LearningRate 0.0634   Epoch: 4   Global Step: 50570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:29:08,073-Speed 3046.58 samples/sec   Loss 10.9424   LearningRate 0.0634   Epoch: 4   Global Step: 50580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:29:11,456-Speed 3027.61 samples/sec   Loss 11.0346   LearningRate 0.0634   Epoch: 4   Global Step: 50590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:29:14,758-Speed 3102.35 samples/sec   Loss 10.9380   LearningRate 0.0634   Epoch: 4   Global Step: 50600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:29:18,126-Speed 3041.38 samples/sec   Loss 11.0084   LearningRate 0.0634   Epoch: 4   Global Step: 50610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:29:21,490-Speed 3045.10 samples/sec   Loss 10.9701   LearningRate 0.0634   Epoch: 4   Global Step: 50620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:29:24,839-Speed 3057.90 samples/sec   Loss 11.0177   LearningRate 0.0634   Epoch: 4   Global Step: 50630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:29:28,236-Speed 3015.60 samples/sec   Loss 11.1063   LearningRate 0.0634   Epoch: 4   Global Step: 50640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:29:31,560-Speed 3081.29 samples/sec   Loss 10.9856   LearningRate 0.0634   Epoch: 4   Global Step: 50650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:29:34,933-Speed 3037.47 samples/sec   Loss 11.0476   LearningRate 0.0634   Epoch: 4   Global Step: 50660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:29:38,328-Speed 3016.54 samples/sec   Loss 11.0885   LearningRate 0.0634   Epoch: 4   Global Step: 50670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:29:41,720-Speed 3020.26 samples/sec   Loss 11.1328   LearningRate 0.0634   Epoch: 4   Global Step: 50680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:29:45,021-Speed 3102.69 samples/sec   Loss 11.0660   LearningRate 0.0634   Epoch: 4   Global Step: 50690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:29:48,285-Speed 3137.90 samples/sec   Loss 11.0669   LearningRate 0.0633   Epoch: 4   Global Step: 50700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:29:51,639-Speed 3053.89 samples/sec   Loss 11.0864   LearningRate 0.0633   Epoch: 4   Global Step: 50710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:29:55,016-Speed 3033.94 samples/sec   Loss 11.0763   LearningRate 0.0633   Epoch: 4   Global Step: 50720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:29:58,297-Speed 3121.08 samples/sec   Loss 11.1872   LearningRate 0.0633   Epoch: 4   Global Step: 50730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:30:01,647-Speed 3057.87 samples/sec   Loss 11.1234   LearningRate 0.0633   Epoch: 4   Global Step: 50740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:30:05,014-Speed 3041.97 samples/sec   Loss 11.1439   LearningRate 0.0633   Epoch: 4   Global Step: 50750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:30:08,367-Speed 3055.40 samples/sec   Loss 11.1688   LearningRate 0.0633   Epoch: 4   Global Step: 50760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:30:11,694-Speed 3078.86 samples/sec   Loss 11.1679   LearningRate 0.0633   Epoch: 4   Global Step: 50770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:30:15,116-Speed 2993.70 samples/sec   Loss 11.0803   LearningRate 0.0633   Epoch: 4   Global Step: 50780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:30:18,441-Speed 3082.71 samples/sec   Loss 11.0808   LearningRate 0.0633   Epoch: 4   Global Step: 50790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:30:21,784-Speed 3063.83 samples/sec   Loss 11.3074   LearningRate 0.0633   Epoch: 4   Global Step: 50800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:30:25,138-Speed 3054.02 samples/sec   Loss 11.0956   LearningRate 0.0633   Epoch: 4   Global Step: 50810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:30:28,523-Speed 3026.28 samples/sec   Loss 11.3605   LearningRate 0.0633   Epoch: 4   Global Step: 50820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:30:31,864-Speed 3066.04 samples/sec   Loss 11.0976   LearningRate 0.0633   Epoch: 4   Global Step: 50830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:30:35,225-Speed 3047.83 samples/sec   Loss 11.2913   LearningRate 0.0633   Epoch: 4   Global Step: 50840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:30:38,627-Speed 3010.64 samples/sec   Loss 11.0790   LearningRate 0.0633   Epoch: 4   Global Step: 50850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:30:42,064-Speed 2980.19 samples/sec   Loss 11.0554   LearningRate 0.0632   Epoch: 4   Global Step: 50860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:30:45,456-Speed 3019.06 samples/sec   Loss 11.2641   LearningRate 0.0632   Epoch: 4   Global Step: 50870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:30:48,846-Speed 3021.87 samples/sec   Loss 11.2661   LearningRate 0.0632   Epoch: 4   Global Step: 50880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:30:52,144-Speed 3105.86 samples/sec   Loss 11.0307   LearningRate 0.0632   Epoch: 4   Global Step: 50890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:30:55,445-Speed 3103.49 samples/sec   Loss 11.0827   LearningRate 0.0632   Epoch: 4   Global Step: 50900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:30:58,773-Speed 3077.23 samples/sec   Loss 11.1074   LearningRate 0.0632   Epoch: 4   Global Step: 50910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:31:02,069-Speed 3108.14 samples/sec   Loss 11.0001   LearningRate 0.0632   Epoch: 4   Global Step: 50920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:31:05,437-Speed 3041.52 samples/sec   Loss 11.2092   LearningRate 0.0632   Epoch: 4   Global Step: 50930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:31:08,762-Speed 3080.92 samples/sec   Loss 11.3024   LearningRate 0.0632   Epoch: 4   Global Step: 50940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:31:12,173-Speed 3003.14 samples/sec   Loss 11.3352   LearningRate 0.0632   Epoch: 4   Global Step: 50950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:31:15,554-Speed 3028.85 samples/sec   Loss 11.2500   LearningRate 0.0632   Epoch: 4   Global Step: 50960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:31:18,850-Speed 3108.71 samples/sec   Loss 11.0118   LearningRate 0.0632   Epoch: 4   Global Step: 50970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:31:22,225-Speed 3034.48 samples/sec   Loss 11.1825   LearningRate 0.0632   Epoch: 4   Global Step: 50980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:31:25,609-Speed 3027.51 samples/sec   Loss 11.3511   LearningRate 0.0632   Epoch: 4   Global Step: 50990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:31:29,026-Speed 2997.28 samples/sec   Loss 11.3320   LearningRate 0.0632   Epoch: 4   Global Step: 51000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:31:32,412-Speed 3025.91 samples/sec   Loss 11.1758   LearningRate 0.0631   Epoch: 4   Global Step: 51010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:31:35,768-Speed 3052.11 samples/sec   Loss 11.2056   LearningRate 0.0631   Epoch: 4   Global Step: 51020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:31:39,110-Speed 3064.25 samples/sec   Loss 11.2543   LearningRate 0.0631   Epoch: 4   Global Step: 51030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:31:42,479-Speed 3040.55 samples/sec   Loss 11.2878   LearningRate 0.0631   Epoch: 4   Global Step: 51040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:31:45,809-Speed 3076.10 samples/sec   Loss 11.2499   LearningRate 0.0631   Epoch: 4   Global Step: 51050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:31:49,179-Speed 3039.64 samples/sec   Loss 11.3215   LearningRate 0.0631   Epoch: 4   Global Step: 51060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:31:52,563-Speed 3027.03 samples/sec   Loss 11.2938   LearningRate 0.0631   Epoch: 4   Global Step: 51070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:31:55,993-Speed 2985.96 samples/sec   Loss 11.3209   LearningRate 0.0631   Epoch: 4   Global Step: 51080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:31:59,408-Speed 3000.03 samples/sec   Loss 11.4735   LearningRate 0.0631   Epoch: 4   Global Step: 51090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:02,802-Speed 3018.67 samples/sec   Loss 11.3373   LearningRate 0.0631   Epoch: 4   Global Step: 51100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:06,185-Speed 3027.59 samples/sec   Loss 11.2587   LearningRate 0.0631   Epoch: 4   Global Step: 51110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:09,535-Speed 3057.26 samples/sec   Loss 11.2616   LearningRate 0.0631   Epoch: 4   Global Step: 51120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:12,883-Speed 3059.75 samples/sec   Loss 11.1369   LearningRate 0.0631   Epoch: 4   Global Step: 51130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:16,197-Speed 3090.55 samples/sec   Loss 11.4007   LearningRate 0.0631   Epoch: 4   Global Step: 51140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:19,552-Speed 3053.69 samples/sec   Loss 11.2803   LearningRate 0.0631   Epoch: 4   Global Step: 51150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:22,865-Speed 3091.43 samples/sec   Loss 11.2991   LearningRate 0.0631   Epoch: 4   Global Step: 51160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:26,153-Speed 3115.64 samples/sec   Loss 11.1851   LearningRate 0.0630   Epoch: 4   Global Step: 51170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:29,529-Speed 3034.24 samples/sec   Loss 11.2745   LearningRate 0.0630   Epoch: 4   Global Step: 51180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:32,916-Speed 3024.44 samples/sec   Loss 11.4857   LearningRate 0.0630   Epoch: 4   Global Step: 51190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:36,260-Speed 3063.10 samples/sec   Loss 11.2209   LearningRate 0.0630   Epoch: 4   Global Step: 51200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:39,669-Speed 3005.03 samples/sec   Loss 11.3500   LearningRate 0.0630   Epoch: 4   Global Step: 51210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:42,973-Speed 3100.55 samples/sec   Loss 11.3563   LearningRate 0.0630   Epoch: 4   Global Step: 51220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:46,315-Speed 3065.07 samples/sec   Loss 11.2981   LearningRate 0.0630   Epoch: 4   Global Step: 51230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:49,673-Speed 3050.30 samples/sec   Loss 11.3745   LearningRate 0.0630   Epoch: 4   Global Step: 51240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:53,022-Speed 3059.11 samples/sec   Loss 11.3323   LearningRate 0.0630   Epoch: 4   Global Step: 51250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:56,425-Speed 3010.28 samples/sec   Loss 11.2998   LearningRate 0.0630   Epoch: 4   Global Step: 51260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:32:59,851-Speed 2989.97 samples/sec   Loss 11.4462   LearningRate 0.0630   Epoch: 4   Global Step: 51270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:03,207-Speed 3051.73 samples/sec   Loss 11.1832   LearningRate 0.0630   Epoch: 4   Global Step: 51280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:06,556-Speed 3059.31 samples/sec   Loss 11.2283   LearningRate 0.0630   Epoch: 4   Global Step: 51290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:09,989-Speed 2983.61 samples/sec   Loss 11.4069   LearningRate 0.0630   Epoch: 4   Global Step: 51300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:13,408-Speed 2995.32 samples/sec   Loss 11.2544   LearningRate 0.0630   Epoch: 4   Global Step: 51310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:16,790-Speed 3029.65 samples/sec   Loss 11.3990   LearningRate 0.0630   Epoch: 4   Global Step: 51320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:20,136-Speed 3061.58 samples/sec   Loss 11.4037   LearningRate 0.0629   Epoch: 4   Global Step: 51330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:23,474-Speed 3068.37 samples/sec   Loss 11.2165   LearningRate 0.0629   Epoch: 4   Global Step: 51340   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-27 06:33:26,825-Speed 3057.00 samples/sec   Loss 11.3671   LearningRate 0.0629   Epoch: 4   Global Step: 51350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:30,235-Speed 3003.84 samples/sec   Loss 11.4477   LearningRate 0.0629   Epoch: 4   Global Step: 51360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:33,536-Speed 3103.00 samples/sec   Loss 11.3713   LearningRate 0.0629   Epoch: 4   Global Step: 51370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:36,852-Speed 3088.81 samples/sec   Loss 11.3408   LearningRate 0.0629   Epoch: 4   Global Step: 51380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:40,185-Speed 3073.37 samples/sec   Loss 11.2923   LearningRate 0.0629   Epoch: 4   Global Step: 51390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:43,604-Speed 2995.65 samples/sec   Loss 11.3757   LearningRate 0.0629   Epoch: 4   Global Step: 51400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:46,917-Speed 3091.33 samples/sec   Loss 11.4027   LearningRate 0.0629   Epoch: 4   Global Step: 51410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:50,283-Speed 3043.70 samples/sec   Loss 11.4656   LearningRate 0.0629   Epoch: 4   Global Step: 51420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:53,574-Speed 3112.55 samples/sec   Loss 11.3181   LearningRate 0.0629   Epoch: 4   Global Step: 51430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:33:56,880-Speed 3098.18 samples/sec   Loss 11.1800   LearningRate 0.0629   Epoch: 4   Global Step: 51440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:34:00,228-Speed 3059.65 samples/sec   Loss 11.4256   LearningRate 0.0629   Epoch: 4   Global Step: 51450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:34:03,586-Speed 3050.18 samples/sec   Loss 11.3915   LearningRate 0.0629   Epoch: 4   Global Step: 51460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:34:06,980-Speed 3017.95 samples/sec   Loss 11.4319   LearningRate 0.0629   Epoch: 4   Global Step: 51470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:34:10,315-Speed 3071.09 samples/sec   Loss 11.5441   LearningRate 0.0628   Epoch: 4   Global Step: 51480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:34:13,662-Speed 3059.99 samples/sec   Loss 11.3398   LearningRate 0.0628   Epoch: 4   Global Step: 51490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:34:17,049-Speed 3024.56 samples/sec   Loss 11.4796   LearningRate 0.0628   Epoch: 4   Global Step: 51500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:34:20,433-Speed 3027.61 samples/sec   Loss 11.4263   LearningRate 0.0628   Epoch: 4   Global Step: 51510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:34:23,850-Speed 2997.60 samples/sec   Loss 11.4646   LearningRate 0.0628   Epoch: 4   Global Step: 51520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:34:27,211-Speed 3047.44 samples/sec   Loss 11.3640   LearningRate 0.0628   Epoch: 4   Global Step: 51530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:34:30,623-Speed 3002.17 samples/sec   Loss 11.2918   LearningRate 0.0628   Epoch: 4   Global Step: 51540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:34:33,960-Speed 3069.67 samples/sec   Loss 11.4209   LearningRate 0.0628   Epoch: 4   Global Step: 51550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:34:37,325-Speed 3044.10 samples/sec   Loss 11.5225   LearningRate 0.0628   Epoch: 4   Global Step: 51560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:34:40,696-Speed 3038.72 samples/sec   Loss 11.4546   LearningRate 0.0628   Epoch: 4   Global Step: 51570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:34:44,029-Speed 3072.54 samples/sec   Loss 11.3316   LearningRate 0.0628   Epoch: 4   Global Step: 51580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:34:47,447-Speed 2997.01 samples/sec   Loss 11.5013   LearningRate 0.0628   Epoch: 4   Global Step: 51590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:34:50,765-Speed 3087.05 samples/sec   Loss 11.2580   LearningRate 0.0628   Epoch: 4   Global Step: 51600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:34:54,174-Speed 3004.67 samples/sec   Loss 11.4342   LearningRate 0.0628   Epoch: 4   Global Step: 51610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:34:57,548-Speed 3036.51 samples/sec   Loss 11.3983   LearningRate 0.0628   Epoch: 4   Global Step: 51620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:35:00,876-Speed 3077.83 samples/sec   Loss 11.4574   LearningRate 0.0628   Epoch: 4   Global Step: 51630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:35:04,281-Speed 3007.86 samples/sec   Loss 11.6563   LearningRate 0.0627   Epoch: 4   Global Step: 51640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:35:07,609-Speed 3077.58 samples/sec   Loss 11.5676   LearningRate 0.0627   Epoch: 4   Global Step: 51650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:35:10,942-Speed 3072.97 samples/sec   Loss 11.5249   LearningRate 0.0627   Epoch: 4   Global Step: 51660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:35:14,345-Speed 3010.31 samples/sec   Loss 11.5381   LearningRate 0.0627   Epoch: 4   Global Step: 51670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:35:17,739-Speed 3018.12 samples/sec   Loss 11.4557   LearningRate 0.0627   Epoch: 4   Global Step: 51680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:35:21,054-Speed 3089.22 samples/sec   Loss 11.2974   LearningRate 0.0627   Epoch: 4   Global Step: 51690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:35:24,345-Speed 3112.49 samples/sec   Loss 11.4376   LearningRate 0.0627   Epoch: 4   Global Step: 51700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:35:27,690-Speed 3062.23 samples/sec   Loss 11.3306   LearningRate 0.0627   Epoch: 4   Global Step: 51710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:35:31,043-Speed 3055.63 samples/sec   Loss 11.5178   LearningRate 0.0627   Epoch: 4   Global Step: 51720   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:35:34,410-Speed 3041.82 samples/sec   Loss 11.5115   LearningRate 0.0627   Epoch: 4   Global Step: 51730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:35:37,744-Speed 3073.02 samples/sec   Loss 11.3897   LearningRate 0.0627   Epoch: 4   Global Step: 51740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:35:41,070-Speed 3079.65 samples/sec   Loss 11.4555   LearningRate 0.0627   Epoch: 4   Global Step: 51750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:35:44,468-Speed 3015.01 samples/sec   Loss 11.6378   LearningRate 0.0627   Epoch: 4   Global Step: 51760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:35:47,768-Speed 3103.98 samples/sec   Loss 11.5636   LearningRate 0.0627   Epoch: 4   Global Step: 51770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:35:51,085-Speed 3088.15 samples/sec   Loss 11.3124   LearningRate 0.0627   Epoch: 4   Global Step: 51780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:35:54,480-Speed 3016.55 samples/sec   Loss 11.5098   LearningRate 0.0627   Epoch: 4   Global Step: 51790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:35:57,909-Speed 2987.56 samples/sec   Loss 11.6361   LearningRate 0.0626   Epoch: 4   Global Step: 51800   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:36:01,238-Speed 3077.28 samples/sec   Loss 11.3764   LearningRate 0.0626   Epoch: 4   Global Step: 51810   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:36:04,679-Speed 2977.00 samples/sec   Loss 11.5924   LearningRate 0.0626   Epoch: 4   Global Step: 51820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:36:07,994-Speed 3089.42 samples/sec   Loss 11.5526   LearningRate 0.0626   Epoch: 4   Global Step: 51830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:36:11,318-Speed 3081.43 samples/sec   Loss 11.3979   LearningRate 0.0626   Epoch: 4   Global Step: 51840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:36:14,616-Speed 3106.02 samples/sec   Loss 11.3544   LearningRate 0.0626   Epoch: 4   Global Step: 51850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:36:17,943-Speed 3078.70 samples/sec   Loss 11.4557   LearningRate 0.0626   Epoch: 4   Global Step: 51860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:36:21,276-Speed 3073.06 samples/sec   Loss 11.5899   LearningRate 0.0626   Epoch: 4   Global Step: 51870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:36:24,702-Speed 2990.17 samples/sec   Loss 11.5094   LearningRate 0.0626   Epoch: 4   Global Step: 51880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:36:28,037-Speed 3071.37 samples/sec   Loss 11.6333   LearningRate 0.0626   Epoch: 4   Global Step: 51890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:36:31,353-Speed 3089.19 samples/sec   Loss 11.5299   LearningRate 0.0626   Epoch: 4   Global Step: 51900   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:36:34,661-Speed 3096.48 samples/sec   Loss 11.5350   LearningRate 0.0626   Epoch: 4   Global Step: 51910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:36:37,993-Speed 3073.74 samples/sec   Loss 11.5738   LearningRate 0.0626   Epoch: 4   Global Step: 51920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:36:41,342-Speed 3059.04 samples/sec   Loss 11.4182   LearningRate 0.0626   Epoch: 4   Global Step: 51930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:36:44,689-Speed 3059.76 samples/sec   Loss 11.6173   LearningRate 0.0626   Epoch: 4   Global Step: 51940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:36:48,060-Speed 3039.21 samples/sec   Loss 11.5498   LearningRate 0.0625   Epoch: 4   Global Step: 51950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:36:51,403-Speed 3063.52 samples/sec   Loss 11.3173   LearningRate 0.0625   Epoch: 4   Global Step: 51960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:36:54,732-Speed 3077.15 samples/sec   Loss 11.4753   LearningRate 0.0625   Epoch: 4   Global Step: 51970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:36:58,030-Speed 3106.21 samples/sec   Loss 11.4384   LearningRate 0.0625   Epoch: 4   Global Step: 51980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:37:01,381-Speed 3056.22 samples/sec   Loss 11.5215   LearningRate 0.0625   Epoch: 4   Global Step: 51990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:37:04,725-Speed 3062.98 samples/sec   Loss 11.4861   LearningRate 0.0625   Epoch: 4   Global Step: 52000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:08,078-Speed 3055.06 samples/sec   Loss 11.4924   LearningRate 0.0625   Epoch: 4   Global Step: 52010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:11,513-Speed 2982.13 samples/sec   Loss 11.5119   LearningRate 0.0625   Epoch: 4   Global Step: 52020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:14,903-Speed 3021.65 samples/sec   Loss 11.3850   LearningRate 0.0625   Epoch: 4   Global Step: 52030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:18,235-Speed 3073.66 samples/sec   Loss 11.4635   LearningRate 0.0625   Epoch: 4   Global Step: 52040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:21,548-Speed 3091.90 samples/sec   Loss 11.5454   LearningRate 0.0625   Epoch: 4   Global Step: 52050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:24,884-Speed 3070.41 samples/sec   Loss 11.3607   LearningRate 0.0625   Epoch: 4   Global Step: 52060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:28,287-Speed 3010.39 samples/sec   Loss 11.4421   LearningRate 0.0625   Epoch: 4   Global Step: 52070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:31,602-Speed 3091.05 samples/sec   Loss 11.6396   LearningRate 0.0625   Epoch: 4   Global Step: 52080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:34,965-Speed 3045.27 samples/sec   Loss 11.3714   LearningRate 0.0625   Epoch: 4   Global Step: 52090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:38,353-Speed 3023.71 samples/sec   Loss 11.6197   LearningRate 0.0625   Epoch: 4   Global Step: 52100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:37:41,721-Speed 3041.22 samples/sec   Loss 11.6068   LearningRate 0.0624   Epoch: 4   Global Step: 52110   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:45,020-Speed 3104.82 samples/sec   Loss 11.4673   LearningRate 0.0624   Epoch: 4   Global Step: 52120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:48,378-Speed 3050.80 samples/sec   Loss 11.4723   LearningRate 0.0624   Epoch: 4   Global Step: 52130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:51,759-Speed 3029.05 samples/sec   Loss 11.5202   LearningRate 0.0624   Epoch: 4   Global Step: 52140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:55,160-Speed 3012.21 samples/sec   Loss 11.6172   LearningRate 0.0624   Epoch: 4   Global Step: 52150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:37:58,487-Speed 3078.40 samples/sec   Loss 11.4489   LearningRate 0.0624   Epoch: 4   Global Step: 52160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:38:01,876-Speed 3022.45 samples/sec   Loss 11.6203   LearningRate 0.0624   Epoch: 4   Global Step: 52170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:38:05,204-Speed 3078.08 samples/sec   Loss 11.3931   LearningRate 0.0624   Epoch: 4   Global Step: 52180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:38:08,556-Speed 3056.35 samples/sec   Loss 11.5527   LearningRate 0.0624   Epoch: 4   Global Step: 52190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:38:11,933-Speed 3032.81 samples/sec   Loss 11.5370   LearningRate 0.0624   Epoch: 4   Global Step: 52200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:38:15,271-Speed 3069.24 samples/sec   Loss 11.4970   LearningRate 0.0624   Epoch: 4   Global Step: 52210   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 06:38:18,747-Speed 2946.82 samples/sec   Loss 11.4302   LearningRate 0.0624   Epoch: 4   Global Step: 52220   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 06:38:22,087-Speed 3066.63 samples/sec   Loss 11.5237   LearningRate 0.0624   Epoch: 4   Global Step: 52230   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 06:38:25,444-Speed 3051.12 samples/sec   Loss 11.6305   LearningRate 0.0624   Epoch: 4   Global Step: 52240   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 06:38:28,768-Speed 3082.14 samples/sec   Loss 11.4836   LearningRate 0.0624   Epoch: 4   Global Step: 52250   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 06:38:32,152-Speed 3026.81 samples/sec   Loss 11.3285   LearningRate 0.0624   Epoch: 4   Global Step: 52260   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 06:38:35,523-Speed 3039.12 samples/sec   Loss 11.7099   LearningRate 0.0623   Epoch: 4   Global Step: 52270   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 06:38:38,927-Speed 3008.86 samples/sec   Loss 11.3358   LearningRate 0.0623   Epoch: 4   Global Step: 52280   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 06:38:42,309-Speed 3028.85 samples/sec   Loss 11.5137   LearningRate 0.0623   Epoch: 4   Global Step: 52290   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 06:38:45,773-Speed 2956.48 samples/sec   Loss 11.5053   LearningRate 0.0623   Epoch: 4   Global Step: 52300   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 06:38:49,206-Speed 2983.36 samples/sec   Loss 11.5574   LearningRate 0.0623   Epoch: 4   Global Step: 52310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:38:52,606-Speed 3013.69 samples/sec   Loss 11.6793   LearningRate 0.0623   Epoch: 4   Global Step: 52320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:38:55,912-Speed 3097.70 samples/sec   Loss 11.7196   LearningRate 0.0623   Epoch: 4   Global Step: 52330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:38:59,244-Speed 3074.36 samples/sec   Loss 11.6159   LearningRate 0.0623   Epoch: 4   Global Step: 52340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:39:02,579-Speed 3071.36 samples/sec   Loss 11.3768   LearningRate 0.0623   Epoch: 4   Global Step: 52350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:39:05,921-Speed 3065.62 samples/sec   Loss 11.5131   LearningRate 0.0623   Epoch: 4   Global Step: 52360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:39:09,320-Speed 3013.25 samples/sec   Loss 11.4978   LearningRate 0.0623   Epoch: 4   Global Step: 52370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:39:12,740-Speed 2995.20 samples/sec   Loss 11.4717   LearningRate 0.0623   Epoch: 4   Global Step: 52380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:39:16,184-Speed 2973.89 samples/sec   Loss 11.5423   LearningRate 0.0623   Epoch: 4   Global Step: 52390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:39:19,639-Speed 2964.89 samples/sec   Loss 11.5521   LearningRate 0.0623   Epoch: 4   Global Step: 52400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:39:23,059-Speed 2995.13 samples/sec   Loss 11.6368   LearningRate 0.0623   Epoch: 4   Global Step: 52410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:39:26,403-Speed 3062.90 samples/sec   Loss 11.5966   LearningRate 0.0622   Epoch: 4   Global Step: 52420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:39:29,847-Speed 2974.30 samples/sec   Loss 11.5709   LearningRate 0.0622   Epoch: 4   Global Step: 52430   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:39:33,238-Speed 3020.30 samples/sec   Loss 11.6526   LearningRate 0.0622   Epoch: 4   Global Step: 52440   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:39:36,629-Speed 3021.44 samples/sec   Loss 11.4652   LearningRate 0.0622   Epoch: 4   Global Step: 52450   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:39:39,989-Speed 3048.73 samples/sec   Loss 11.6241   LearningRate 0.0622   Epoch: 4   Global Step: 52460   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:39:43,354-Speed 3043.08 samples/sec   Loss 11.5563   LearningRate 0.0622   Epoch: 4   Global Step: 52470   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:39:46,697-Speed 3063.98 samples/sec   Loss 11.4757   LearningRate 0.0622   Epoch: 4   Global Step: 52480   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:39:50,078-Speed 3029.45 samples/sec   Loss 11.4553   LearningRate 0.0622   Epoch: 4   Global Step: 52490   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:39:53,470-Speed 3019.94 samples/sec   Loss 11.5550   LearningRate 0.0622   Epoch: 4   Global Step: 52500   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:39:56,832-Speed 3046.45 samples/sec   Loss 11.5511   LearningRate 0.0622   Epoch: 4   Global Step: 52510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:40:00,159-Speed 3078.76 samples/sec   Loss 11.5237   LearningRate 0.0622   Epoch: 4   Global Step: 52520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:40:03,553-Speed 3018.49 samples/sec   Loss 11.6642   LearningRate 0.0622   Epoch: 4   Global Step: 52530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:40:06,868-Speed 3089.18 samples/sec   Loss 11.6161   LearningRate 0.0622   Epoch: 4   Global Step: 52540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:40:10,176-Speed 3096.67 samples/sec   Loss 11.6641   LearningRate 0.0622   Epoch: 4   Global Step: 52550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:40:13,486-Speed 3094.80 samples/sec   Loss 11.6242   LearningRate 0.0622   Epoch: 4   Global Step: 52560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:40:16,882-Speed 3015.58 samples/sec   Loss 11.5397   LearningRate 0.0622   Epoch: 4   Global Step: 52570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:40:20,287-Speed 3008.57 samples/sec   Loss 11.6073   LearningRate 0.0621   Epoch: 4   Global Step: 52580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:40:23,693-Speed 3007.82 samples/sec   Loss 11.5848   LearningRate 0.0621   Epoch: 4   Global Step: 52590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:40:27,070-Speed 3032.78 samples/sec   Loss 11.5148   LearningRate 0.0621   Epoch: 4   Global Step: 52600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:40:30,534-Speed 2957.39 samples/sec   Loss 11.5442   LearningRate 0.0621   Epoch: 4   Global Step: 52610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:40:33,837-Speed 3100.74 samples/sec   Loss 11.3956   LearningRate 0.0621   Epoch: 4   Global Step: 52620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:40:37,216-Speed 3030.85 samples/sec   Loss 11.4521   LearningRate 0.0621   Epoch: 4   Global Step: 52630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:40:40,675-Speed 2961.70 samples/sec   Loss 11.6088   LearningRate 0.0621   Epoch: 4   Global Step: 52640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:40:44,027-Speed 3056.23 samples/sec   Loss 11.4317   LearningRate 0.0621   Epoch: 4   Global Step: 52650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:40:47,391-Speed 3044.38 samples/sec   Loss 11.4772   LearningRate 0.0621   Epoch: 4   Global Step: 52660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:40:50,731-Speed 3067.19 samples/sec   Loss 11.4311   LearningRate 0.0621   Epoch: 4   Global Step: 52670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:40:54,103-Speed 3037.52 samples/sec   Loss 11.5573   LearningRate 0.0621   Epoch: 4   Global Step: 52680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:40:57,483-Speed 3029.99 samples/sec   Loss 11.6076   LearningRate 0.0621   Epoch: 4   Global Step: 52690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:41:00,841-Speed 3049.87 samples/sec   Loss 11.4488   LearningRate 0.0621   Epoch: 4   Global Step: 52700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:41:04,174-Speed 3073.88 samples/sec   Loss 11.6093   LearningRate 0.0621   Epoch: 4   Global Step: 52710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:41:07,565-Speed 3020.47 samples/sec   Loss 11.4824   LearningRate 0.0621   Epoch: 4   Global Step: 52720   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:41:10,893-Speed 3078.12 samples/sec   Loss 11.3490   LearningRate 0.0621   Epoch: 4   Global Step: 52730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:41:14,192-Speed 3105.12 samples/sec   Loss 11.5135   LearningRate 0.0620   Epoch: 4   Global Step: 52740   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:41:17,577-Speed 3025.69 samples/sec   Loss 11.7010   LearningRate 0.0620   Epoch: 4   Global Step: 52750   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:41:20,969-Speed 3019.73 samples/sec   Loss 11.5087   LearningRate 0.0620   Epoch: 4   Global Step: 52760   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:41:24,413-Speed 2974.10 samples/sec   Loss 11.7607   LearningRate 0.0620   Epoch: 4   Global Step: 52770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:41:27,815-Speed 3010.89 samples/sec   Loss 11.5344   LearningRate 0.0620   Epoch: 4   Global Step: 52780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:41:31,183-Speed 3041.16 samples/sec   Loss 11.4367   LearningRate 0.0620   Epoch: 4   Global Step: 52790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:41:34,545-Speed 3047.31 samples/sec   Loss 11.4832   LearningRate 0.0620   Epoch: 4   Global Step: 52800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:41:37,900-Speed 3052.74 samples/sec   Loss 11.5413   LearningRate 0.0620   Epoch: 4   Global Step: 52810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:41:41,203-Speed 3101.38 samples/sec   Loss 11.6134   LearningRate 0.0620   Epoch: 4   Global Step: 52820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:41:44,504-Speed 3102.35 samples/sec   Loss 11.4772   LearningRate 0.0620   Epoch: 4   Global Step: 52830   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:41:47,805-Speed 3103.83 samples/sec   Loss 11.5041   LearningRate 0.0620   Epoch: 4   Global Step: 52840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:41:51,136-Speed 3075.66 samples/sec   Loss 11.7017   LearningRate 0.0620   Epoch: 4   Global Step: 52850   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:41:54,480-Speed 3063.17 samples/sec   Loss 11.7797   LearningRate 0.0620   Epoch: 4   Global Step: 52860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:41:57,862-Speed 3028.98 samples/sec   Loss 11.4753   LearningRate 0.0620   Epoch: 4   Global Step: 52870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:01,204-Speed 3064.95 samples/sec   Loss 11.5872   LearningRate 0.0620   Epoch: 4   Global Step: 52880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:04,547-Speed 3064.29 samples/sec   Loss 11.6314   LearningRate 0.0620   Epoch: 4   Global Step: 52890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:07,912-Speed 3044.13 samples/sec   Loss 11.6410   LearningRate 0.0619   Epoch: 4   Global Step: 52900   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:11,251-Speed 3067.86 samples/sec   Loss 11.5472   LearningRate 0.0619   Epoch: 4   Global Step: 52910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:14,628-Speed 3032.70 samples/sec   Loss 11.6932   LearningRate 0.0619   Epoch: 4   Global Step: 52920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:18,045-Speed 2997.49 samples/sec   Loss 11.6474   LearningRate 0.0619   Epoch: 4   Global Step: 52930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:42:21,366-Speed 3084.09 samples/sec   Loss 11.5503   LearningRate 0.0619   Epoch: 4   Global Step: 52940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:42:24,766-Speed 3012.85 samples/sec   Loss 11.6461   LearningRate 0.0619   Epoch: 4   Global Step: 52950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:42:28,168-Speed 3010.77 samples/sec   Loss 11.5648   LearningRate 0.0619   Epoch: 4   Global Step: 52960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:31,535-Speed 3042.25 samples/sec   Loss 11.6238   LearningRate 0.0619   Epoch: 4   Global Step: 52970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:34,877-Speed 3065.21 samples/sec   Loss 11.7090   LearningRate 0.0619   Epoch: 4   Global Step: 52980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:38,266-Speed 3021.92 samples/sec   Loss 11.6024   LearningRate 0.0619   Epoch: 4   Global Step: 52990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:41,623-Speed 3051.21 samples/sec   Loss 11.6165   LearningRate 0.0619   Epoch: 4   Global Step: 53000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:45,101-Speed 2945.11 samples/sec   Loss 11.5451   LearningRate 0.0619   Epoch: 4   Global Step: 53010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:48,500-Speed 3013.53 samples/sec   Loss 11.6437   LearningRate 0.0619   Epoch: 4   Global Step: 53020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:51,917-Speed 2997.66 samples/sec   Loss 11.5956   LearningRate 0.0619   Epoch: 4   Global Step: 53030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:55,223-Speed 3097.97 samples/sec   Loss 11.6205   LearningRate 0.0619   Epoch: 4   Global Step: 53040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:42:58,559-Speed 3070.31 samples/sec   Loss 11.4869   LearningRate 0.0619   Epoch: 4   Global Step: 53050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:43:02,012-Speed 2966.09 samples/sec   Loss 11.6238   LearningRate 0.0618   Epoch: 4   Global Step: 53060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:05,332-Speed 3086.17 samples/sec   Loss 11.6509   LearningRate 0.0618   Epoch: 4   Global Step: 53070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:08,654-Speed 3083.40 samples/sec   Loss 11.5492   LearningRate 0.0618   Epoch: 4   Global Step: 53080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:12,030-Speed 3033.81 samples/sec   Loss 11.3747   LearningRate 0.0618   Epoch: 4   Global Step: 53090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:15,388-Speed 3050.52 samples/sec   Loss 11.6113   LearningRate 0.0618   Epoch: 4   Global Step: 53100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:18,779-Speed 3020.50 samples/sec   Loss 11.6653   LearningRate 0.0618   Epoch: 4   Global Step: 53110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:22,112-Speed 3072.85 samples/sec   Loss 11.6415   LearningRate 0.0618   Epoch: 4   Global Step: 53120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:25,507-Speed 3017.20 samples/sec   Loss 11.4282   LearningRate 0.0618   Epoch: 4   Global Step: 53130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:28,929-Speed 2993.33 samples/sec   Loss 11.4686   LearningRate 0.0618   Epoch: 4   Global Step: 53140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:32,277-Speed 3059.27 samples/sec   Loss 11.4220   LearningRate 0.0618   Epoch: 4   Global Step: 53150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:35,630-Speed 3055.24 samples/sec   Loss 11.6781   LearningRate 0.0618   Epoch: 4   Global Step: 53160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:38,997-Speed 3042.00 samples/sec   Loss 11.6844   LearningRate 0.0618   Epoch: 4   Global Step: 53170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:42,341-Speed 3063.29 samples/sec   Loss 11.5940   LearningRate 0.0618   Epoch: 4   Global Step: 53180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:45,725-Speed 3026.66 samples/sec   Loss 11.5914   LearningRate 0.0618   Epoch: 4   Global Step: 53190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:49,142-Speed 2997.25 samples/sec   Loss 11.6731   LearningRate 0.0618   Epoch: 4   Global Step: 53200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:52,502-Speed 3048.92 samples/sec   Loss 11.5838   LearningRate 0.0617   Epoch: 4   Global Step: 53210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:55,885-Speed 3027.93 samples/sec   Loss 11.6365   LearningRate 0.0617   Epoch: 4   Global Step: 53220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:43:59,265-Speed 3030.23 samples/sec   Loss 11.6113   LearningRate 0.0617   Epoch: 4   Global Step: 53230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:44:02,672-Speed 3006.01 samples/sec   Loss 11.7168   LearningRate 0.0617   Epoch: 4   Global Step: 53240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:44:06,004-Speed 3074.29 samples/sec   Loss 11.5103   LearningRate 0.0617   Epoch: 4   Global Step: 53250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:44:09,327-Speed 3082.42 samples/sec   Loss 11.3810   LearningRate 0.0617   Epoch: 4   Global Step: 53260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:44:12,692-Speed 3043.98 samples/sec   Loss 11.5713   LearningRate 0.0617   Epoch: 4   Global Step: 53270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:44:16,082-Speed 3021.55 samples/sec   Loss 11.6224   LearningRate 0.0617   Epoch: 4   Global Step: 53280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:44:19,476-Speed 3018.24 samples/sec   Loss 11.5785   LearningRate 0.0617   Epoch: 4   Global Step: 53290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:44:22,861-Speed 3026.01 samples/sec   Loss 11.5583   LearningRate 0.0617   Epoch: 4   Global Step: 53300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:44:26,237-Speed 3033.94 samples/sec   Loss 11.6053   LearningRate 0.0617   Epoch: 4   Global Step: 53310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:44:29,536-Speed 3104.13 samples/sec   Loss 11.4815   LearningRate 0.0617   Epoch: 4   Global Step: 53320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:44:32,974-Speed 2979.55 samples/sec   Loss 11.5005   LearningRate 0.0617   Epoch: 4   Global Step: 53330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:44:36,437-Speed 2957.77 samples/sec   Loss 11.7660   LearningRate 0.0617   Epoch: 4   Global Step: 53340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:44:39,893-Speed 2964.20 samples/sec   Loss 11.6570   LearningRate 0.0617   Epoch: 4   Global Step: 53350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:44:43,361-Speed 2953.33 samples/sec   Loss 11.6226   LearningRate 0.0617   Epoch: 4   Global Step: 53360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:44:46,704-Speed 3064.19 samples/sec   Loss 11.3747   LearningRate 0.0616   Epoch: 4   Global Step: 53370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:44:50,094-Speed 3021.67 samples/sec   Loss 11.5258   LearningRate 0.0616   Epoch: 4   Global Step: 53380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:44:53,489-Speed 3016.31 samples/sec   Loss 11.6545   LearningRate 0.0616   Epoch: 4   Global Step: 53390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:44:56,869-Speed 3031.23 samples/sec   Loss 11.5663   LearningRate 0.0616   Epoch: 4   Global Step: 53400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:45:00,275-Speed 3007.16 samples/sec   Loss 11.6166   LearningRate 0.0616   Epoch: 4   Global Step: 53410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:45:03,681-Speed 3007.00 samples/sec   Loss 11.5613   LearningRate 0.0616   Epoch: 4   Global Step: 53420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:45:07,036-Speed 3053.36 samples/sec   Loss 11.6251   LearningRate 0.0616   Epoch: 4   Global Step: 53430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:45:10,363-Speed 3079.78 samples/sec   Loss 11.7030   LearningRate 0.0616   Epoch: 4   Global Step: 53440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:45:13,735-Speed 3037.29 samples/sec   Loss 11.4739   LearningRate 0.0616   Epoch: 4   Global Step: 53450   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:45:17,089-Speed 3054.45 samples/sec   Loss 11.7657   LearningRate 0.0616   Epoch: 4   Global Step: 53460   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:45:20,461-Speed 3037.53 samples/sec   Loss 11.3226   LearningRate 0.0616   Epoch: 4   Global Step: 53470   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:45:23,888-Speed 2988.74 samples/sec   Loss 11.4777   LearningRate 0.0616   Epoch: 4   Global Step: 53480   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:45:27,308-Speed 2995.63 samples/sec   Loss 11.5155   LearningRate 0.0616   Epoch: 4   Global Step: 53490   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:45:30,728-Speed 2994.59 samples/sec   Loss 11.7165   LearningRate 0.0616   Epoch: 4   Global Step: 53500   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:45:34,103-Speed 3034.83 samples/sec   Loss 11.4667   LearningRate 0.0616   Epoch: 4   Global Step: 53510   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:45:37,502-Speed 3013.04 samples/sec   Loss 11.5797   LearningRate 0.0616   Epoch: 4   Global Step: 53520   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:45:40,931-Speed 2987.52 samples/sec   Loss 11.6972   LearningRate 0.0615   Epoch: 4   Global Step: 53530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:45:44,306-Speed 3034.99 samples/sec   Loss 11.6369   LearningRate 0.0615   Epoch: 4   Global Step: 53540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:45:47,631-Speed 3080.06 samples/sec   Loss 11.5096   LearningRate 0.0615   Epoch: 4   Global Step: 53550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:45:50,999-Speed 3042.53 samples/sec   Loss 11.6290   LearningRate 0.0615   Epoch: 4   Global Step: 53560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:45:54,347-Speed 3059.41 samples/sec   Loss 11.5215   LearningRate 0.0615   Epoch: 4   Global Step: 53570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:45:57,699-Speed 3055.45 samples/sec   Loss 11.7505   LearningRate 0.0615   Epoch: 4   Global Step: 53580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:46:01,098-Speed 3013.44 samples/sec   Loss 11.6729   LearningRate 0.0615   Epoch: 4   Global Step: 53590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:46:04,428-Speed 3076.13 samples/sec   Loss 11.6419   LearningRate 0.0615   Epoch: 4   Global Step: 53600   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:46:07,815-Speed 3024.14 samples/sec   Loss 11.5296   LearningRate 0.0615   Epoch: 4   Global Step: 53610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:46:11,206-Speed 3020.58 samples/sec   Loss 11.6102   LearningRate 0.0615   Epoch: 4   Global Step: 53620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:46:14,591-Speed 3025.52 samples/sec   Loss 11.5859   LearningRate 0.0615   Epoch: 4   Global Step: 53630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:46:17,987-Speed 3016.93 samples/sec   Loss 11.6633   LearningRate 0.0615   Epoch: 4   Global Step: 53640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:46:21,306-Speed 3085.36 samples/sec   Loss 11.6721   LearningRate 0.0615   Epoch: 4   Global Step: 53650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:46:24,657-Speed 3056.82 samples/sec   Loss 11.5103   LearningRate 0.0615   Epoch: 4   Global Step: 53660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:46:28,029-Speed 3037.86 samples/sec   Loss 11.7440   LearningRate 0.0615   Epoch: 4   Global Step: 53670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:46:31,330-Speed 3102.95 samples/sec   Loss 11.4998   LearningRate 0.0615   Epoch: 4   Global Step: 53680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:46:34,688-Speed 3049.80 samples/sec   Loss 11.5729   LearningRate 0.0614   Epoch: 4   Global Step: 53690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:46:38,109-Speed 2994.50 samples/sec   Loss 11.3812   LearningRate 0.0614   Epoch: 4   Global Step: 53700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:46:41,442-Speed 3073.02 samples/sec   Loss 11.5781   LearningRate 0.0614   Epoch: 4   Global Step: 53710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:46:44,855-Speed 3000.49 samples/sec   Loss 11.5163   LearningRate 0.0614   Epoch: 4   Global Step: 53720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 06:46:48,256-Speed 3012.62 samples/sec   Loss 11.5531   LearningRate 0.0614   Epoch: 4   Global Step: 53730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:46:51,703-Speed 2971.51 samples/sec   Loss 11.5736   LearningRate 0.0614   Epoch: 4   Global Step: 53740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:46:55,082-Speed 3030.99 samples/sec   Loss 11.5702   LearningRate 0.0614   Epoch: 4   Global Step: 53750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:46:58,511-Speed 2987.66 samples/sec   Loss 11.5257   LearningRate 0.0614   Epoch: 4   Global Step: 53760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:47:01,886-Speed 3034.46 samples/sec   Loss 11.5903   LearningRate 0.0614   Epoch: 4   Global Step: 53770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:47:05,274-Speed 3023.85 samples/sec   Loss 11.6636   LearningRate 0.0614   Epoch: 4   Global Step: 53780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:47:08,738-Speed 2956.99 samples/sec   Loss 11.6953   LearningRate 0.0614   Epoch: 4   Global Step: 53790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:47:12,093-Speed 3053.28 samples/sec   Loss 11.5624   LearningRate 0.0614   Epoch: 4   Global Step: 53800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:47:15,411-Speed 3086.92 samples/sec   Loss 11.5044   LearningRate 0.0614   Epoch: 4   Global Step: 53810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:47:18,793-Speed 3028.85 samples/sec   Loss 11.5833   LearningRate 0.0614   Epoch: 4   Global Step: 53820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:47:22,168-Speed 3034.65 samples/sec   Loss 11.6299   LearningRate 0.0614   Epoch: 4   Global Step: 53830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:47:25,574-Speed 3007.05 samples/sec   Loss 11.4583   LearningRate 0.0614   Epoch: 4   Global Step: 53840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:47:28,955-Speed 3030.17 samples/sec   Loss 11.5450   LearningRate 0.0613   Epoch: 4   Global Step: 53850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:47:32,342-Speed 3024.04 samples/sec   Loss 11.4612   LearningRate 0.0613   Epoch: 4   Global Step: 53860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:47:35,823-Speed 2942.08 samples/sec   Loss 11.6803   LearningRate 0.0613   Epoch: 4   Global Step: 53870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:47:39,241-Speed 2997.48 samples/sec   Loss 11.4536   LearningRate 0.0613   Epoch: 4   Global Step: 53880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:47:42,644-Speed 3009.94 samples/sec   Loss 11.5574   LearningRate 0.0613   Epoch: 4   Global Step: 53890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:47:45,962-Speed 3087.06 samples/sec   Loss 11.6529   LearningRate 0.0613   Epoch: 4   Global Step: 53900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:47:49,349-Speed 3024.22 samples/sec   Loss 11.5886   LearningRate 0.0613   Epoch: 4   Global Step: 53910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:47:52,665-Speed 3089.10 samples/sec   Loss 11.6872   LearningRate 0.0613   Epoch: 4   Global Step: 53920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:47:55,967-Speed 3102.01 samples/sec   Loss 11.4107   LearningRate 0.0613   Epoch: 4   Global Step: 53930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:47:59,262-Speed 3108.53 samples/sec   Loss 11.6026   LearningRate 0.0613   Epoch: 4   Global Step: 53940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:48:02,581-Speed 3086.04 samples/sec   Loss 11.5950   LearningRate 0.0613   Epoch: 4   Global Step: 53950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:48:05,987-Speed 3007.21 samples/sec   Loss 11.4597   LearningRate 0.0613   Epoch: 4   Global Step: 53960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:48:09,305-Speed 3087.15 samples/sec   Loss 11.6579   LearningRate 0.0613   Epoch: 4   Global Step: 53970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:48:12,778-Speed 2949.43 samples/sec   Loss 11.7087   LearningRate 0.0613   Epoch: 4   Global Step: 53980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:48:16,206-Speed 2988.05 samples/sec   Loss 11.7129   LearningRate 0.0613   Epoch: 4   Global Step: 53990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:48:20,473-Speed 2400.55 samples/sec   Loss 11.7100   LearningRate 0.0613   Epoch: 4   Global Step: 54000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:48:23,958-Speed 2939.01 samples/sec   Loss 11.6768   LearningRate 0.0612   Epoch: 4   Global Step: 54010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:48:27,356-Speed 3014.47 samples/sec   Loss 11.6457   LearningRate 0.0612   Epoch: 4   Global Step: 54020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:48:30,714-Speed 3050.07 samples/sec   Loss 11.5811   LearningRate 0.0612   Epoch: 4   Global Step: 54030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:48:34,148-Speed 2982.53 samples/sec   Loss 11.5617   LearningRate 0.0612   Epoch: 4   Global Step: 54040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:48:37,568-Speed 2995.31 samples/sec   Loss 11.6401   LearningRate 0.0612   Epoch: 4   Global Step: 54050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:48:40,911-Speed 3065.68 samples/sec   Loss 11.5651   LearningRate 0.0612   Epoch: 4   Global Step: 54060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:48:44,202-Speed 3111.71 samples/sec   Loss 11.5416   LearningRate 0.0612   Epoch: 4   Global Step: 54070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:48:47,577-Speed 3035.50 samples/sec   Loss 11.4634   LearningRate 0.0612   Epoch: 4   Global Step: 54080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:48:51,091-Speed 2914.44 samples/sec   Loss 11.5157   LearningRate 0.0612   Epoch: 4   Global Step: 54090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:48:54,560-Speed 2952.80 samples/sec   Loss 11.6055   LearningRate 0.0612   Epoch: 4   Global Step: 54100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:48:57,989-Speed 2987.53 samples/sec   Loss 11.7500   LearningRate 0.0612   Epoch: 4   Global Step: 54110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:49:01,390-Speed 3011.65 samples/sec   Loss 11.5336   LearningRate 0.0612   Epoch: 4   Global Step: 54120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:49:04,764-Speed 3035.75 samples/sec   Loss 11.6623   LearningRate 0.0612   Epoch: 4   Global Step: 54130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:49:08,184-Speed 2994.67 samples/sec   Loss 11.6055   LearningRate 0.0612   Epoch: 4   Global Step: 54140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:49:11,580-Speed 3016.82 samples/sec   Loss 11.5778   LearningRate 0.0612   Epoch: 4   Global Step: 54150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:49:14,945-Speed 3043.46 samples/sec   Loss 11.6073   LearningRate 0.0611   Epoch: 4   Global Step: 54160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:49:18,294-Speed 3058.70 samples/sec   Loss 11.7006   LearningRate 0.0611   Epoch: 4   Global Step: 54170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:49:21,707-Speed 3000.44 samples/sec   Loss 11.6388   LearningRate 0.0611   Epoch: 4   Global Step: 54180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:49:25,037-Speed 3076.50 samples/sec   Loss 11.7313   LearningRate 0.0611   Epoch: 4   Global Step: 54190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:49:28,458-Speed 2994.29 samples/sec   Loss 11.5348   LearningRate 0.0611   Epoch: 4   Global Step: 54200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:49:31,810-Speed 3055.10 samples/sec   Loss 11.7427   LearningRate 0.0611   Epoch: 4   Global Step: 54210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:49:35,141-Speed 3075.67 samples/sec   Loss 11.6284   LearningRate 0.0611   Epoch: 4   Global Step: 54220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:49:38,515-Speed 3035.99 samples/sec   Loss 11.6754   LearningRate 0.0611   Epoch: 4   Global Step: 54230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:49:41,837-Speed 3083.24 samples/sec   Loss 11.4993   LearningRate 0.0611   Epoch: 4   Global Step: 54240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:49:45,147-Speed 3094.99 samples/sec   Loss 11.7094   LearningRate 0.0611   Epoch: 4   Global Step: 54250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:49:48,577-Speed 2986.02 samples/sec   Loss 11.4413   LearningRate 0.0611   Epoch: 4   Global Step: 54260   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:49:52,036-Speed 2961.63 samples/sec   Loss 11.5536   LearningRate 0.0611   Epoch: 4   Global Step: 54270   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:49:55,409-Speed 3037.23 samples/sec   Loss 11.6943   LearningRate 0.0611   Epoch: 4   Global Step: 54280   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:49:58,835-Speed 2989.57 samples/sec   Loss 11.7027   LearningRate 0.0611   Epoch: 4   Global Step: 54290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:02,183-Speed 3059.69 samples/sec   Loss 11.6542   LearningRate 0.0611   Epoch: 4   Global Step: 54300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:05,575-Speed 3019.48 samples/sec   Loss 11.5895   LearningRate 0.0611   Epoch: 4   Global Step: 54310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:08,951-Speed 3033.81 samples/sec   Loss 11.6485   LearningRate 0.0610   Epoch: 4   Global Step: 54320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:12,297-Speed 3061.52 samples/sec   Loss 11.6097   LearningRate 0.0610   Epoch: 4   Global Step: 54330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:15,672-Speed 3034.24 samples/sec   Loss 11.5364   LearningRate 0.0610   Epoch: 4   Global Step: 54340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:19,019-Speed 3061.27 samples/sec   Loss 11.5736   LearningRate 0.0610   Epoch: 4   Global Step: 54350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:22,470-Speed 2968.20 samples/sec   Loss 11.5920   LearningRate 0.0610   Epoch: 4   Global Step: 54360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:50:25,827-Speed 3051.15 samples/sec   Loss 11.6272   LearningRate 0.0610   Epoch: 4   Global Step: 54370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:29,131-Speed 3099.61 samples/sec   Loss 11.4548   LearningRate 0.0610   Epoch: 4   Global Step: 54380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:32,559-Speed 2988.43 samples/sec   Loss 11.6443   LearningRate 0.0610   Epoch: 4   Global Step: 54390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:35,934-Speed 3036.30 samples/sec   Loss 11.4786   LearningRate 0.0610   Epoch: 4   Global Step: 54400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:39,224-Speed 3113.74 samples/sec   Loss 11.6555   LearningRate 0.0610   Epoch: 4   Global Step: 54410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:42,506-Speed 3120.54 samples/sec   Loss 11.3626   LearningRate 0.0610   Epoch: 4   Global Step: 54420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:45,961-Speed 2964.25 samples/sec   Loss 11.6597   LearningRate 0.0610   Epoch: 4   Global Step: 54430   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:49,407-Speed 2972.54 samples/sec   Loss 11.7660   LearningRate 0.0610   Epoch: 4   Global Step: 54440   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:52,706-Speed 3105.71 samples/sec   Loss 11.3766   LearningRate 0.0610   Epoch: 4   Global Step: 54450   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:56,114-Speed 3005.07 samples/sec   Loss 11.5163   LearningRate 0.0610   Epoch: 4   Global Step: 54460   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:50:59,443-Speed 3077.14 samples/sec   Loss 11.5583   LearningRate 0.0610   Epoch: 4   Global Step: 54470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:51:02,758-Speed 3090.07 samples/sec   Loss 11.6187   LearningRate 0.0609   Epoch: 4   Global Step: 54480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:51:06,125-Speed 3042.15 samples/sec   Loss 11.6358   LearningRate 0.0609   Epoch: 4   Global Step: 54490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:51:09,443-Speed 3087.68 samples/sec   Loss 11.6346   LearningRate 0.0609   Epoch: 4   Global Step: 54500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:51:12,851-Speed 3005.67 samples/sec   Loss 11.5825   LearningRate 0.0609   Epoch: 4   Global Step: 54510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:51:16,298-Speed 2971.80 samples/sec   Loss 11.6363   LearningRate 0.0609   Epoch: 4   Global Step: 54520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:51:19,623-Speed 3079.87 samples/sec   Loss 11.5246   LearningRate 0.0609   Epoch: 4   Global Step: 54530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:51:22,970-Speed 3060.64 samples/sec   Loss 11.6334   LearningRate 0.0609   Epoch: 4   Global Step: 54540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:51:26,356-Speed 3025.12 samples/sec   Loss 11.6182   LearningRate 0.0609   Epoch: 4   Global Step: 54550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:51:29,712-Speed 3052.33 samples/sec   Loss 11.6346   LearningRate 0.0609   Epoch: 4   Global Step: 54560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:51:33,116-Speed 3009.03 samples/sec   Loss 11.4725   LearningRate 0.0609   Epoch: 4   Global Step: 54570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:51:36,496-Speed 3030.79 samples/sec   Loss 11.5694   LearningRate 0.0609   Epoch: 4   Global Step: 54580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:51:39,967-Speed 2950.55 samples/sec   Loss 11.4761   LearningRate 0.0609   Epoch: 4   Global Step: 54590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:51:43,384-Speed 2998.30 samples/sec   Loss 11.6286   LearningRate 0.0609   Epoch: 4   Global Step: 54600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:51:46,795-Speed 3002.90 samples/sec   Loss 11.3999   LearningRate 0.0609   Epoch: 4   Global Step: 54610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:51:50,135-Speed 3066.74 samples/sec   Loss 11.3411   LearningRate 0.0609   Epoch: 4   Global Step: 54620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:51:53,540-Speed 3007.39 samples/sec   Loss 11.4038   LearningRate 0.0609   Epoch: 4   Global Step: 54630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:51:56,864-Speed 3081.65 samples/sec   Loss 11.6866   LearningRate 0.0608   Epoch: 4   Global Step: 54640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:52:00,241-Speed 3032.96 samples/sec   Loss 11.4638   LearningRate 0.0608   Epoch: 4   Global Step: 54650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:52:03,608-Speed 3042.55 samples/sec   Loss 11.6801   LearningRate 0.0608   Epoch: 4   Global Step: 54660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:52:06,920-Speed 3092.99 samples/sec   Loss 11.6925   LearningRate 0.0608   Epoch: 4   Global Step: 54670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:52:10,298-Speed 3032.00 samples/sec   Loss 11.5874   LearningRate 0.0608   Epoch: 4   Global Step: 54680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:52:13,662-Speed 3044.69 samples/sec   Loss 11.5637   LearningRate 0.0608   Epoch: 4   Global Step: 54690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:52:16,968-Speed 3098.58 samples/sec   Loss 11.3575   LearningRate 0.0608   Epoch: 4   Global Step: 54700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:52:20,315-Speed 3059.97 samples/sec   Loss 11.4761   LearningRate 0.0608   Epoch: 4   Global Step: 54710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:52:23,713-Speed 3014.92 samples/sec   Loss 11.4925   LearningRate 0.0608   Epoch: 4   Global Step: 54720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:52:27,099-Speed 3024.84 samples/sec   Loss 11.6400   LearningRate 0.0608   Epoch: 4   Global Step: 54730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:52:30,477-Speed 3032.17 samples/sec   Loss 11.6669   LearningRate 0.0608   Epoch: 4   Global Step: 54740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:52:33,799-Speed 3083.49 samples/sec   Loss 11.6831   LearningRate 0.0608   Epoch: 4   Global Step: 54750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:52:37,173-Speed 3035.64 samples/sec   Loss 11.5887   LearningRate 0.0608   Epoch: 4   Global Step: 54760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:52:40,540-Speed 3042.44 samples/sec   Loss 11.6317   LearningRate 0.0608   Epoch: 4   Global Step: 54770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:52:43,988-Speed 2971.30 samples/sec   Loss 11.5056   LearningRate 0.0608   Epoch: 4   Global Step: 54780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:52:47,477-Speed 2935.18 samples/sec   Loss 11.7754   LearningRate 0.0608   Epoch: 4   Global Step: 54790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:52:50,791-Speed 3091.37 samples/sec   Loss 11.6175   LearningRate 0.0607   Epoch: 4   Global Step: 54800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:52:54,200-Speed 3004.93 samples/sec   Loss 11.6063   LearningRate 0.0607   Epoch: 4   Global Step: 54810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:52:57,595-Speed 3016.84 samples/sec   Loss 11.6220   LearningRate 0.0607   Epoch: 4   Global Step: 54820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:01,008-Speed 3001.26 samples/sec   Loss 11.5411   LearningRate 0.0607   Epoch: 4   Global Step: 54830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:04,344-Speed 3071.16 samples/sec   Loss 11.4212   LearningRate 0.0607   Epoch: 4   Global Step: 54840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:07,789-Speed 2973.49 samples/sec   Loss 11.5562   LearningRate 0.0607   Epoch: 4   Global Step: 54850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:11,188-Speed 3013.86 samples/sec   Loss 11.5483   LearningRate 0.0607   Epoch: 4   Global Step: 54860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:14,555-Speed 3042.11 samples/sec   Loss 11.5924   LearningRate 0.0607   Epoch: 4   Global Step: 54870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:17,932-Speed 3032.50 samples/sec   Loss 11.4875   LearningRate 0.0607   Epoch: 4   Global Step: 54880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:21,290-Speed 3050.34 samples/sec   Loss 11.4838   LearningRate 0.0607   Epoch: 4   Global Step: 54890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:24,711-Speed 2994.72 samples/sec   Loss 11.6014   LearningRate 0.0607   Epoch: 4   Global Step: 54900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:28,083-Speed 3036.93 samples/sec   Loss 11.6597   LearningRate 0.0607   Epoch: 4   Global Step: 54910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:31,469-Speed 3025.14 samples/sec   Loss 11.5459   LearningRate 0.0607   Epoch: 4   Global Step: 54920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:34,890-Speed 2993.98 samples/sec   Loss 11.3760   LearningRate 0.0607   Epoch: 4   Global Step: 54930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:38,403-Speed 2924.40 samples/sec   Loss 11.6175   LearningRate 0.0607   Epoch: 4   Global Step: 54940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:41,750-Speed 3060.34 samples/sec   Loss 11.7068   LearningRate 0.0607   Epoch: 4   Global Step: 54950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:45,075-Speed 3080.55 samples/sec   Loss 11.5954   LearningRate 0.0606   Epoch: 4   Global Step: 54960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:48,448-Speed 3036.57 samples/sec   Loss 11.5201   LearningRate 0.0606   Epoch: 4   Global Step: 54970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:51,785-Speed 3070.09 samples/sec   Loss 11.4773   LearningRate 0.0606   Epoch: 4   Global Step: 54980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:55,150-Speed 3043.62 samples/sec   Loss 11.4960   LearningRate 0.0606   Epoch: 4   Global Step: 54990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:53:58,466-Speed 3089.65 samples/sec   Loss 11.5797   LearningRate 0.0606   Epoch: 4   Global Step: 55000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:01,840-Speed 3035.52 samples/sec   Loss 11.3024   LearningRate 0.0606   Epoch: 4   Global Step: 55010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:05,233-Speed 3018.92 samples/sec   Loss 11.6026   LearningRate 0.0606   Epoch: 4   Global Step: 55020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:08,594-Speed 3047.72 samples/sec   Loss 11.6042   LearningRate 0.0606   Epoch: 4   Global Step: 55030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:11,932-Speed 3069.19 samples/sec   Loss 11.5802   LearningRate 0.0606   Epoch: 4   Global Step: 55040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:15,325-Speed 3018.49 samples/sec   Loss 11.4687   LearningRate 0.0606   Epoch: 4   Global Step: 55050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:18,703-Speed 3031.75 samples/sec   Loss 11.5485   LearningRate 0.0606   Epoch: 4   Global Step: 55060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:22,053-Speed 3058.48 samples/sec   Loss 11.5818   LearningRate 0.0606   Epoch: 4   Global Step: 55070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:25,424-Speed 3038.78 samples/sec   Loss 11.6391   LearningRate 0.0606   Epoch: 4   Global Step: 55080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:28,835-Speed 3002.88 samples/sec   Loss 11.4567   LearningRate 0.0606   Epoch: 4   Global Step: 55090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:32,286-Speed 2968.48 samples/sec   Loss 11.3760   LearningRate 0.0606   Epoch: 4   Global Step: 55100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:35,620-Speed 3072.28 samples/sec   Loss 11.6156   LearningRate 0.0606   Epoch: 4   Global Step: 55110   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-27 06:54:38,999-Speed 3031.58 samples/sec   Loss 11.6581   LearningRate 0.0605   Epoch: 4   Global Step: 55120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:42,336-Speed 3069.20 samples/sec   Loss 11.6172   LearningRate 0.0605   Epoch: 4   Global Step: 55130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:45,730-Speed 3018.00 samples/sec   Loss 11.5688   LearningRate 0.0605   Epoch: 4   Global Step: 55140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:49,087-Speed 3051.51 samples/sec   Loss 11.4421   LearningRate 0.0605   Epoch: 4   Global Step: 55150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:52,506-Speed 2995.62 samples/sec   Loss 11.6760   LearningRate 0.0605   Epoch: 4   Global Step: 55160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:55,844-Speed 3068.41 samples/sec   Loss 11.5579   LearningRate 0.0605   Epoch: 4   Global Step: 55170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:54:59,217-Speed 3036.36 samples/sec   Loss 11.6270   LearningRate 0.0605   Epoch: 4   Global Step: 55180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:55:02,678-Speed 2959.86 samples/sec   Loss 11.4861   LearningRate 0.0605   Epoch: 4   Global Step: 55190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:55:06,103-Speed 2990.16 samples/sec   Loss 11.4877   LearningRate 0.0605   Epoch: 4   Global Step: 55200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:55:09,447-Speed 3063.74 samples/sec   Loss 11.4588   LearningRate 0.0605   Epoch: 4   Global Step: 55210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:55:12,853-Speed 3007.53 samples/sec   Loss 11.5588   LearningRate 0.0605   Epoch: 4   Global Step: 55220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:55:16,243-Speed 3021.58 samples/sec   Loss 11.5880   LearningRate 0.0605   Epoch: 4   Global Step: 55230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:55:19,609-Speed 3042.40 samples/sec   Loss 11.3791   LearningRate 0.0605   Epoch: 4   Global Step: 55240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:55:22,987-Speed 3032.87 samples/sec   Loss 11.5418   LearningRate 0.0605   Epoch: 4   Global Step: 55250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:55:26,357-Speed 3039.79 samples/sec   Loss 11.5544   LearningRate 0.0605   Epoch: 4   Global Step: 55260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:55:29,770-Speed 3000.78 samples/sec   Loss 11.4896   LearningRate 0.0605   Epoch: 4   Global Step: 55270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:55:33,111-Speed 3066.17 samples/sec   Loss 11.5447   LearningRate 0.0604   Epoch: 4   Global Step: 55280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:55:36,468-Speed 3051.28 samples/sec   Loss 11.7086   LearningRate 0.0604   Epoch: 4   Global Step: 55290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:55:39,938-Speed 2951.82 samples/sec   Loss 11.5620   LearningRate 0.0604   Epoch: 4   Global Step: 55300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:55:43,298-Speed 3047.82 samples/sec   Loss 11.6295   LearningRate 0.0604   Epoch: 4   Global Step: 55310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:55:46,698-Speed 3013.52 samples/sec   Loss 11.6400   LearningRate 0.0604   Epoch: 4   Global Step: 55320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:55:50,059-Speed 3047.07 samples/sec   Loss 11.4215   LearningRate 0.0604   Epoch: 4   Global Step: 55330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:55:53,424-Speed 3044.43 samples/sec   Loss 11.4877   LearningRate 0.0604   Epoch: 4   Global Step: 55340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:55:56,880-Speed 2963.68 samples/sec   Loss 11.5364   LearningRate 0.0604   Epoch: 4   Global Step: 55350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:56:00,282-Speed 3011.00 samples/sec   Loss 11.5523   LearningRate 0.0604   Epoch: 4   Global Step: 55360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:56:03,656-Speed 3036.27 samples/sec   Loss 11.3728   LearningRate 0.0604   Epoch: 4   Global Step: 55370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:56:07,032-Speed 3033.49 samples/sec   Loss 11.3947   LearningRate 0.0604   Epoch: 4   Global Step: 55380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:56:10,479-Speed 2971.73 samples/sec   Loss 11.6221   LearningRate 0.0604   Epoch: 4   Global Step: 55390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:56:13,883-Speed 3008.47 samples/sec   Loss 11.4897   LearningRate 0.0604   Epoch: 4   Global Step: 55400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:56:17,275-Speed 3019.85 samples/sec   Loss 11.4919   LearningRate 0.0604   Epoch: 4   Global Step: 55410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:56:20,652-Speed 3033.25 samples/sec   Loss 11.4363   LearningRate 0.0604   Epoch: 4   Global Step: 55420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:56:24,049-Speed 3015.25 samples/sec   Loss 11.5609   LearningRate 0.0604   Epoch: 4   Global Step: 55430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:56:27,409-Speed 3048.55 samples/sec   Loss 11.5522   LearningRate 0.0603   Epoch: 4   Global Step: 55440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:56:30,798-Speed 3023.00 samples/sec   Loss 11.5554   LearningRate 0.0603   Epoch: 4   Global Step: 55450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:56:34,222-Speed 2990.65 samples/sec   Loss 11.4038   LearningRate 0.0603   Epoch: 4   Global Step: 55460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:56:37,581-Speed 3049.14 samples/sec   Loss 11.5485   LearningRate 0.0603   Epoch: 4   Global Step: 55470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:56:41,041-Speed 2960.67 samples/sec   Loss 11.4855   LearningRate 0.0603   Epoch: 4   Global Step: 55480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:56:44,465-Speed 2991.82 samples/sec   Loss 11.7439   LearningRate 0.0603   Epoch: 4   Global Step: 55490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:56:47,884-Speed 2995.81 samples/sec   Loss 11.5072   LearningRate 0.0603   Epoch: 4   Global Step: 55500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:56:51,219-Speed 3071.84 samples/sec   Loss 11.3675   LearningRate 0.0603   Epoch: 4   Global Step: 55510   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-27 06:56:54,603-Speed 3026.64 samples/sec   Loss 11.5961   LearningRate 0.0603   Epoch: 4   Global Step: 55520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:56:57,973-Speed 3039.31 samples/sec   Loss 11.4663   LearningRate 0.0603   Epoch: 4   Global Step: 55530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:57:01,355-Speed 3029.03 samples/sec   Loss 11.5605   LearningRate 0.0603   Epoch: 4   Global Step: 55540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:57:04,761-Speed 3007.30 samples/sec   Loss 11.5025   LearningRate 0.0603   Epoch: 4   Global Step: 55550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:57:08,179-Speed 2996.79 samples/sec   Loss 11.5076   LearningRate 0.0603   Epoch: 4   Global Step: 55560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:57:11,602-Speed 2992.08 samples/sec   Loss 11.4896   LearningRate 0.0603   Epoch: 4   Global Step: 55570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:57:15,063-Speed 2960.10 samples/sec   Loss 11.6332   LearningRate 0.0603   Epoch: 4   Global Step: 55580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:57:18,403-Speed 3065.74 samples/sec   Loss 11.3800   LearningRate 0.0603   Epoch: 4   Global Step: 55590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:57:21,760-Speed 3051.45 samples/sec   Loss 11.4920   LearningRate 0.0602   Epoch: 4   Global Step: 55600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:57:25,161-Speed 3012.45 samples/sec   Loss 11.5997   LearningRate 0.0602   Epoch: 4   Global Step: 55610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:57:28,638-Speed 2946.02 samples/sec   Loss 11.4053   LearningRate 0.0602   Epoch: 4   Global Step: 55620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:57:31,963-Speed 3080.39 samples/sec   Loss 11.5900   LearningRate 0.0602   Epoch: 4   Global Step: 55630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:57:35,414-Speed 2967.71 samples/sec   Loss 11.5037   LearningRate 0.0602   Epoch: 4   Global Step: 55640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:57:38,780-Speed 3042.90 samples/sec   Loss 11.4142   LearningRate 0.0602   Epoch: 4   Global Step: 55650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:57:42,230-Speed 2969.39 samples/sec   Loss 11.4739   LearningRate 0.0602   Epoch: 4   Global Step: 55660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:57:45,566-Speed 3070.17 samples/sec   Loss 11.5183   LearningRate 0.0602   Epoch: 4   Global Step: 55670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:57:48,960-Speed 3018.36 samples/sec   Loss 11.5945   LearningRate 0.0602   Epoch: 4   Global Step: 55680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:57:52,322-Speed 3047.18 samples/sec   Loss 11.3687   LearningRate 0.0602   Epoch: 4   Global Step: 55690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:57:55,679-Speed 3050.95 samples/sec   Loss 11.5815   LearningRate 0.0602   Epoch: 4   Global Step: 55700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:57:59,065-Speed 3025.70 samples/sec   Loss 11.3251   LearningRate 0.0602   Epoch: 4   Global Step: 55710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:58:02,544-Speed 2944.30 samples/sec   Loss 11.5321   LearningRate 0.0602   Epoch: 4   Global Step: 55720   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:58:05,928-Speed 3027.01 samples/sec   Loss 11.5871   LearningRate 0.0602   Epoch: 4   Global Step: 55730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:58:09,328-Speed 3012.88 samples/sec   Loss 11.3673   LearningRate 0.0602   Epoch: 4   Global Step: 55740   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:58:12,759-Speed 2985.13 samples/sec   Loss 11.4013   LearningRate 0.0602   Epoch: 4   Global Step: 55750   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:58:16,176-Speed 2997.61 samples/sec   Loss 11.4281   LearningRate 0.0601   Epoch: 4   Global Step: 55760   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:58:19,545-Speed 3040.66 samples/sec   Loss 11.4786   LearningRate 0.0601   Epoch: 4   Global Step: 55770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:58:22,959-Speed 3000.49 samples/sec   Loss 11.3154   LearningRate 0.0601   Epoch: 4   Global Step: 55780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:58:26,419-Speed 2960.36 samples/sec   Loss 11.4079   LearningRate 0.0601   Epoch: 4   Global Step: 55790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:58:29,877-Speed 2961.95 samples/sec   Loss 11.5757   LearningRate 0.0601   Epoch: 4   Global Step: 55800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:58:33,295-Speed 2997.28 samples/sec   Loss 11.3913   LearningRate 0.0601   Epoch: 4   Global Step: 55810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:58:36,688-Speed 3018.93 samples/sec   Loss 11.4472   LearningRate 0.0601   Epoch: 4   Global Step: 55820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:58:40,127-Speed 2978.13 samples/sec   Loss 11.5921   LearningRate 0.0601   Epoch: 4   Global Step: 55830   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:58:44,829-Speed 2178.20 samples/sec   Loss 11.4634   LearningRate 0.0601   Epoch: 4   Global Step: 55840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:58:48,809-Speed 2573.51 samples/sec   Loss 11.4984   LearningRate 0.0601   Epoch: 4   Global Step: 55850   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:58:52,215-Speed 3007.76 samples/sec   Loss 11.4198   LearningRate 0.0601   Epoch: 4   Global Step: 55860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:58:57,020-Speed 2131.66 samples/sec   Loss 11.5302   LearningRate 0.0601   Epoch: 4   Global Step: 55870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:59:00,399-Speed 3031.86 samples/sec   Loss 11.5322   LearningRate 0.0601   Epoch: 4   Global Step: 55880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:59:03,735-Speed 3069.89 samples/sec   Loss 11.3742   LearningRate 0.0601   Epoch: 4   Global Step: 55890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 06:59:07,126-Speed 3020.84 samples/sec   Loss 11.4942   LearningRate 0.0601   Epoch: 4   Global Step: 55900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:10,554-Speed 2988.13 samples/sec   Loss 11.3215   LearningRate 0.0601   Epoch: 4   Global Step: 55910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:13,863-Speed 3095.08 samples/sec   Loss 11.4072   LearningRate 0.0600   Epoch: 4   Global Step: 55920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:17,207-Speed 3062.97 samples/sec   Loss 11.3841   LearningRate 0.0600   Epoch: 4   Global Step: 55930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:20,551-Speed 3063.40 samples/sec   Loss 11.5572   LearningRate 0.0600   Epoch: 4   Global Step: 55940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:23,902-Speed 3057.06 samples/sec   Loss 11.4554   LearningRate 0.0600   Epoch: 4   Global Step: 55950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:27,295-Speed 3019.62 samples/sec   Loss 11.2526   LearningRate 0.0600   Epoch: 4   Global Step: 55960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:30,664-Speed 3039.68 samples/sec   Loss 11.5452   LearningRate 0.0600   Epoch: 4   Global Step: 55970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:34,050-Speed 3025.20 samples/sec   Loss 11.5616   LearningRate 0.0600   Epoch: 4   Global Step: 55980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:37,528-Speed 2945.73 samples/sec   Loss 11.3846   LearningRate 0.0600   Epoch: 4   Global Step: 55990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:40,854-Speed 3079.15 samples/sec   Loss 11.5850   LearningRate 0.0600   Epoch: 4   Global Step: 56000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:44,210-Speed 3052.52 samples/sec   Loss 11.3889   LearningRate 0.0600   Epoch: 4   Global Step: 56010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:47,613-Speed 3010.19 samples/sec   Loss 11.5346   LearningRate 0.0600   Epoch: 4   Global Step: 56020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:51,014-Speed 3011.06 samples/sec   Loss 11.4462   LearningRate 0.0600   Epoch: 4   Global Step: 56030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:54,405-Speed 3021.27 samples/sec   Loss 11.3112   LearningRate 0.0600   Epoch: 4   Global Step: 56040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 06:59:57,738-Speed 3072.81 samples/sec   Loss 11.6375   LearningRate 0.0600   Epoch: 4   Global Step: 56050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:01,124-Speed 3025.02 samples/sec   Loss 11.4595   LearningRate 0.0600   Epoch: 4   Global Step: 56060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:04,588-Speed 2957.20 samples/sec   Loss 11.5088   LearningRate 0.0600   Epoch: 4   Global Step: 56070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:08,013-Speed 2990.94 samples/sec   Loss 11.5526   LearningRate 0.0599   Epoch: 4   Global Step: 56080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:11,454-Speed 2976.19 samples/sec   Loss 11.4891   LearningRate 0.0599   Epoch: 4   Global Step: 56090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:14,816-Speed 3046.72 samples/sec   Loss 11.4998   LearningRate 0.0599   Epoch: 4   Global Step: 56100   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-27 07:00:18,144-Speed 3078.45 samples/sec   Loss 11.5691   LearningRate 0.0599   Epoch: 4   Global Step: 56110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:21,533-Speed 3022.69 samples/sec   Loss 11.5568   LearningRate 0.0599   Epoch: 4   Global Step: 56120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:24,956-Speed 2992.72 samples/sec   Loss 11.4859   LearningRate 0.0599   Epoch: 4   Global Step: 56130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:28,345-Speed 3022.34 samples/sec   Loss 11.7370   LearningRate 0.0599   Epoch: 4   Global Step: 56140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:31,735-Speed 3022.16 samples/sec   Loss 11.4940   LearningRate 0.0599   Epoch: 4   Global Step: 56150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:35,208-Speed 2949.01 samples/sec   Loss 11.5322   LearningRate 0.0599   Epoch: 4   Global Step: 56160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:38,570-Speed 3046.96 samples/sec   Loss 11.4676   LearningRate 0.0599   Epoch: 4   Global Step: 56170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:41,945-Speed 3035.00 samples/sec   Loss 11.2923   LearningRate 0.0599   Epoch: 4   Global Step: 56180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:45,293-Speed 3059.24 samples/sec   Loss 11.3572   LearningRate 0.0599   Epoch: 4   Global Step: 56190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:48,730-Speed 2980.28 samples/sec   Loss 11.4926   LearningRate 0.0599   Epoch: 4   Global Step: 56200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:52,139-Speed 3004.78 samples/sec   Loss 11.3945   LearningRate 0.0599   Epoch: 4   Global Step: 56210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:55,561-Speed 2993.30 samples/sec   Loss 11.4882   LearningRate 0.0599   Epoch: 4   Global Step: 56220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:00:59,041-Speed 2943.15 samples/sec   Loss 11.5208   LearningRate 0.0599   Epoch: 4   Global Step: 56230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:01:02,430-Speed 3022.75 samples/sec   Loss 11.4793   LearningRate 0.0598   Epoch: 4   Global Step: 56240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:01:05,815-Speed 3025.36 samples/sec   Loss 11.4419   LearningRate 0.0598   Epoch: 4   Global Step: 56250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:01:09,201-Speed 3025.50 samples/sec   Loss 11.4253   LearningRate 0.0598   Epoch: 4   Global Step: 56260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:01:12,510-Speed 3095.94 samples/sec   Loss 11.4288   LearningRate 0.0598   Epoch: 4   Global Step: 56270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:01:15,905-Speed 3016.86 samples/sec   Loss 11.6186   LearningRate 0.0598   Epoch: 4   Global Step: 56280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:01:19,301-Speed 3016.31 samples/sec   Loss 11.5868   LearningRate 0.0598   Epoch: 4   Global Step: 56290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:01:22,654-Speed 3056.38 samples/sec   Loss 11.3809   LearningRate 0.0598   Epoch: 4   Global Step: 56300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:01:26,029-Speed 3034.60 samples/sec   Loss 11.4971   LearningRate 0.0598   Epoch: 4   Global Step: 56310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:01:29,367-Speed 3068.74 samples/sec   Loss 11.3996   LearningRate 0.0598   Epoch: 4   Global Step: 56320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:01:32,814-Speed 2971.41 samples/sec   Loss 11.3141   LearningRate 0.0598   Epoch: 4   Global Step: 56330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:01:36,207-Speed 3018.72 samples/sec   Loss 11.4727   LearningRate 0.0598   Epoch: 4   Global Step: 56340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:01:39,579-Speed 3037.91 samples/sec   Loss 11.5985   LearningRate 0.0598   Epoch: 4   Global Step: 56350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:01:42,942-Speed 3045.95 samples/sec   Loss 11.5016   LearningRate 0.0598   Epoch: 4   Global Step: 56360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:01:46,308-Speed 3043.15 samples/sec   Loss 11.5362   LearningRate 0.0598   Epoch: 4   Global Step: 56370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:01:49,688-Speed 3030.70 samples/sec   Loss 11.4841   LearningRate 0.0598   Epoch: 4   Global Step: 56380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:01:53,095-Speed 3006.86 samples/sec   Loss 11.4883   LearningRate 0.0598   Epoch: 4   Global Step: 56390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:01:56,607-Speed 2916.61 samples/sec   Loss 11.5432   LearningRate 0.0597   Epoch: 4   Global Step: 56400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:00,007-Speed 3011.89 samples/sec   Loss 11.4357   LearningRate 0.0597   Epoch: 4   Global Step: 56410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:03,435-Speed 2988.22 samples/sec   Loss 11.5043   LearningRate 0.0597   Epoch: 4   Global Step: 56420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:06,843-Speed 3005.89 samples/sec   Loss 11.3976   LearningRate 0.0597   Epoch: 4   Global Step: 56430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:10,290-Speed 2971.36 samples/sec   Loss 11.3199   LearningRate 0.0597   Epoch: 4   Global Step: 56440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:13,696-Speed 3007.23 samples/sec   Loss 11.4037   LearningRate 0.0597   Epoch: 4   Global Step: 56450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:17,188-Speed 2933.04 samples/sec   Loss 11.4951   LearningRate 0.0597   Epoch: 4   Global Step: 56460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:20,621-Speed 2984.30 samples/sec   Loss 11.4567   LearningRate 0.0597   Epoch: 4   Global Step: 56470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:24,055-Speed 2982.35 samples/sec   Loss 11.4138   LearningRate 0.0597   Epoch: 4   Global Step: 56480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:27,475-Speed 2995.15 samples/sec   Loss 11.4343   LearningRate 0.0597   Epoch: 4   Global Step: 56490   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-27 07:02:30,916-Speed 2976.72 samples/sec   Loss 11.3709   LearningRate 0.0597   Epoch: 4   Global Step: 56500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:34,310-Speed 3018.33 samples/sec   Loss 11.5384   LearningRate 0.0597   Epoch: 4   Global Step: 56510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:37,664-Speed 3053.54 samples/sec   Loss 11.2924   LearningRate 0.0597   Epoch: 4   Global Step: 56520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:41,082-Speed 2997.24 samples/sec   Loss 11.4263   LearningRate 0.0597   Epoch: 4   Global Step: 56530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:44,586-Speed 2923.41 samples/sec   Loss 11.3670   LearningRate 0.0597   Epoch: 4   Global Step: 56540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:47,945-Speed 3049.26 samples/sec   Loss 11.4072   LearningRate 0.0597   Epoch: 4   Global Step: 56550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:51,368-Speed 2992.52 samples/sec   Loss 11.3691   LearningRate 0.0596   Epoch: 4   Global Step: 56560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:54,846-Speed 2944.56 samples/sec   Loss 11.5604   LearningRate 0.0596   Epoch: 4   Global Step: 56570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:02:58,296-Speed 2969.87 samples/sec   Loss 11.4066   LearningRate 0.0596   Epoch: 4   Global Step: 56580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:03:01,694-Speed 3014.00 samples/sec   Loss 11.3146   LearningRate 0.0596   Epoch: 4   Global Step: 56590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:03:05,092-Speed 3014.07 samples/sec   Loss 11.4655   LearningRate 0.0596   Epoch: 4   Global Step: 56600   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:03:08,566-Speed 2948.07 samples/sec   Loss 11.4526   LearningRate 0.0596   Epoch: 4   Global Step: 56610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:03:11,999-Speed 2983.98 samples/sec   Loss 11.2916   LearningRate 0.0596   Epoch: 4   Global Step: 56620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:03:15,420-Speed 2994.01 samples/sec   Loss 11.3023   LearningRate 0.0596   Epoch: 4   Global Step: 56630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:03:18,773-Speed 3054.63 samples/sec   Loss 11.3935   LearningRate 0.0596   Epoch: 4   Global Step: 56640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:03:22,126-Speed 3055.54 samples/sec   Loss 11.2830   LearningRate 0.0596   Epoch: 4   Global Step: 56650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:03:25,487-Speed 3047.78 samples/sec   Loss 11.4464   LearningRate 0.0596   Epoch: 4   Global Step: 56660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:03:28,840-Speed 3055.29 samples/sec   Loss 11.5545   LearningRate 0.0596   Epoch: 4   Global Step: 56670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:03:32,197-Speed 3050.52 samples/sec   Loss 11.3073   LearningRate 0.0596   Epoch: 4   Global Step: 56680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:03:35,552-Speed 3053.63 samples/sec   Loss 11.3365   LearningRate 0.0596   Epoch: 4   Global Step: 56690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:03:38,921-Speed 3039.93 samples/sec   Loss 11.4233   LearningRate 0.0596   Epoch: 4   Global Step: 56700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:03:42,332-Speed 3002.73 samples/sec   Loss 11.3306   LearningRate 0.0596   Epoch: 4   Global Step: 56710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:03:45,787-Speed 2964.93 samples/sec   Loss 11.3830   LearningRate 0.0595   Epoch: 4   Global Step: 56720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:03:49,180-Speed 3018.58 samples/sec   Loss 11.3677   LearningRate 0.0595   Epoch: 4   Global Step: 56730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:03:52,512-Speed 3074.30 samples/sec   Loss 11.6405   LearningRate 0.0595   Epoch: 4   Global Step: 56740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:03:55,973-Speed 2959.67 samples/sec   Loss 11.3311   LearningRate 0.0595   Epoch: 4   Global Step: 56750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:03:59,415-Speed 2976.21 samples/sec   Loss 11.3268   LearningRate 0.0595   Epoch: 4   Global Step: 56760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:04:02,824-Speed 3004.57 samples/sec   Loss 11.5467   LearningRate 0.0595   Epoch: 4   Global Step: 56770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:04:06,155-Speed 3074.64 samples/sec   Loss 11.3409   LearningRate 0.0595   Epoch: 4   Global Step: 56780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:04:09,536-Speed 3029.51 samples/sec   Loss 11.4723   LearningRate 0.0595   Epoch: 4   Global Step: 56790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:04:12,922-Speed 3025.20 samples/sec   Loss 11.3872   LearningRate 0.0595   Epoch: 4   Global Step: 56800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:04:16,309-Speed 3024.43 samples/sec   Loss 11.5193   LearningRate 0.0595   Epoch: 4   Global Step: 56810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:04:19,758-Speed 2970.09 samples/sec   Loss 11.3382   LearningRate 0.0595   Epoch: 4   Global Step: 56820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:04:23,164-Speed 3007.06 samples/sec   Loss 11.4808   LearningRate 0.0595   Epoch: 4   Global Step: 56830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:04:26,586-Speed 2993.37 samples/sec   Loss 11.3485   LearningRate 0.0595   Epoch: 4   Global Step: 56840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:04:29,971-Speed 3026.67 samples/sec   Loss 11.5241   LearningRate 0.0595   Epoch: 4   Global Step: 56850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:04:33,392-Speed 2993.47 samples/sec   Loss 11.3890   LearningRate 0.0595   Epoch: 4   Global Step: 56860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:04:36,741-Speed 3058.91 samples/sec   Loss 11.3799   LearningRate 0.0595   Epoch: 4   Global Step: 56870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:04:40,124-Speed 3028.08 samples/sec   Loss 11.4795   LearningRate 0.0594   Epoch: 4   Global Step: 56880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:04:43,622-Speed 2927.75 samples/sec   Loss 11.5086   LearningRate 0.0594   Epoch: 4   Global Step: 56890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:04:46,985-Speed 3045.88 samples/sec   Loss 11.2048   LearningRate 0.0594   Epoch: 4   Global Step: 56900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:04:50,371-Speed 3025.33 samples/sec   Loss 11.4811   LearningRate 0.0594   Epoch: 4   Global Step: 56910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:04:53,846-Speed 2947.06 samples/sec   Loss 11.5453   LearningRate 0.0594   Epoch: 4   Global Step: 56920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:04:57,204-Speed 3050.52 samples/sec   Loss 11.3281   LearningRate 0.0594   Epoch: 4   Global Step: 56930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:05:00,574-Speed 3040.12 samples/sec   Loss 11.3996   LearningRate 0.0594   Epoch: 4   Global Step: 56940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:05:03,929-Speed 3053.32 samples/sec   Loss 11.4104   LearningRate 0.0594   Epoch: 4   Global Step: 56950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:05:07,352-Speed 2992.44 samples/sec   Loss 11.5590   LearningRate 0.0594   Epoch: 4   Global Step: 56960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:05:10,731-Speed 3031.12 samples/sec   Loss 11.3539   LearningRate 0.0594   Epoch: 4   Global Step: 56970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:05:14,185-Speed 2965.50 samples/sec   Loss 11.3665   LearningRate 0.0594   Epoch: 4   Global Step: 56980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:05:17,526-Speed 3065.89 samples/sec   Loss 11.4097   LearningRate 0.0594   Epoch: 4   Global Step: 56990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:05:20,959-Speed 2984.00 samples/sec   Loss 11.3228   LearningRate 0.0594   Epoch: 4   Global Step: 57000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:05:24,289-Speed 3075.32 samples/sec   Loss 11.5212   LearningRate 0.0594   Epoch: 4   Global Step: 57010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:05:27,647-Speed 3051.09 samples/sec   Loss 11.5040   LearningRate 0.0594   Epoch: 4   Global Step: 57020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:05:31,108-Speed 2959.04 samples/sec   Loss 11.5582   LearningRate 0.0594   Epoch: 4   Global Step: 57030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:05:34,441-Speed 3073.82 samples/sec   Loss 11.4292   LearningRate 0.0593   Epoch: 4   Global Step: 57040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:05:37,792-Speed 3056.08 samples/sec   Loss 11.3884   LearningRate 0.0593   Epoch: 4   Global Step: 57050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:05:41,260-Speed 2954.24 samples/sec   Loss 11.3799   LearningRate 0.0593   Epoch: 4   Global Step: 57060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:05:44,560-Speed 3103.15 samples/sec   Loss 11.4198   LearningRate 0.0593   Epoch: 4   Global Step: 57070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:05:47,952-Speed 3020.28 samples/sec   Loss 11.3774   LearningRate 0.0593   Epoch: 4   Global Step: 57080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:05:51,373-Speed 2993.51 samples/sec   Loss 11.4725   LearningRate 0.0593   Epoch: 4   Global Step: 57090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:05:54,806-Speed 2983.85 samples/sec   Loss 11.4423   LearningRate 0.0593   Epoch: 4   Global Step: 57100   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:05:58,221-Speed 3000.03 samples/sec   Loss 11.3209   LearningRate 0.0593   Epoch: 4   Global Step: 57110   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:06:01,668-Speed 2971.88 samples/sec   Loss 11.5428   LearningRate 0.0593   Epoch: 4   Global Step: 57120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:06:05,089-Speed 2993.77 samples/sec   Loss 11.4490   LearningRate 0.0593   Epoch: 4   Global Step: 57130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:06:08,532-Speed 2974.57 samples/sec   Loss 11.3488   LearningRate 0.0593   Epoch: 4   Global Step: 57140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:06:11,887-Speed 3053.15 samples/sec   Loss 11.5412   LearningRate 0.0593   Epoch: 4   Global Step: 57150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:06:15,220-Speed 3073.97 samples/sec   Loss 11.3971   LearningRate 0.0593   Epoch: 4   Global Step: 57160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:06:18,615-Speed 3016.75 samples/sec   Loss 11.3871   LearningRate 0.0593   Epoch: 4   Global Step: 57170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:06:22,067-Speed 2967.21 samples/sec   Loss 11.6059   LearningRate 0.0593   Epoch: 4   Global Step: 57180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:06:25,465-Speed 3014.50 samples/sec   Loss 11.5082   LearningRate 0.0593   Epoch: 4   Global Step: 57190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:06:28,931-Speed 2955.22 samples/sec   Loss 11.3759   LearningRate 0.0593   Epoch: 4   Global Step: 57200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:06:32,331-Speed 3012.99 samples/sec   Loss 11.3609   LearningRate 0.0592   Epoch: 4   Global Step: 57210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:06:35,681-Speed 3057.85 samples/sec   Loss 11.3824   LearningRate 0.0592   Epoch: 4   Global Step: 57220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:06:39,103-Speed 2993.79 samples/sec   Loss 11.4371   LearningRate 0.0592   Epoch: 4   Global Step: 57230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:06:42,484-Speed 3028.84 samples/sec   Loss 11.3725   LearningRate 0.0592   Epoch: 4   Global Step: 57240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:06:45,877-Speed 3018.82 samples/sec   Loss 11.3888   LearningRate 0.0592   Epoch: 4   Global Step: 57250   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:06:49,276-Speed 3014.19 samples/sec   Loss 11.4764   LearningRate 0.0592   Epoch: 4   Global Step: 57260   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:06:52,678-Speed 3010.10 samples/sec   Loss 11.6325   LearningRate 0.0592   Epoch: 4   Global Step: 57270   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:06:56,117-Speed 2979.09 samples/sec   Loss 11.3182   LearningRate 0.0592   Epoch: 4   Global Step: 57280   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:06:59,499-Speed 3028.43 samples/sec   Loss 11.4831   LearningRate 0.0592   Epoch: 4   Global Step: 57290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:07:02,908-Speed 3005.07 samples/sec   Loss 11.4212   LearningRate 0.0592   Epoch: 4   Global Step: 57300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:07:06,349-Speed 2976.69 samples/sec   Loss 11.1888   LearningRate 0.0592   Epoch: 4   Global Step: 57310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:07:09,815-Speed 2954.84 samples/sec   Loss 11.3069   LearningRate 0.0592   Epoch: 4   Global Step: 57320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:07:13,288-Speed 2949.79 samples/sec   Loss 11.3224   LearningRate 0.0592   Epoch: 4   Global Step: 57330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:07:16,698-Speed 3004.08 samples/sec   Loss 11.3678   LearningRate 0.0592   Epoch: 4   Global Step: 57340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:07:20,102-Speed 3008.74 samples/sec   Loss 11.4376   LearningRate 0.0592   Epoch: 4   Global Step: 57350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:07:23,640-Speed 2894.88 samples/sec   Loss 11.2842   LearningRate 0.0592   Epoch: 4   Global Step: 57360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:07:27,025-Speed 3026.06 samples/sec   Loss 11.1437   LearningRate 0.0591   Epoch: 4   Global Step: 57370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:07:30,468-Speed 2975.27 samples/sec   Loss 11.3502   LearningRate 0.0591   Epoch: 4   Global Step: 57380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:07:33,868-Speed 3011.89 samples/sec   Loss 11.5038   LearningRate 0.0591   Epoch: 4   Global Step: 57390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:07:37,314-Speed 2972.86 samples/sec   Loss 11.3671   LearningRate 0.0591   Epoch: 4   Global Step: 57400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:07:40,686-Speed 3037.73 samples/sec   Loss 11.3121   LearningRate 0.0591   Epoch: 4   Global Step: 57410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:07:44,108-Speed 2993.42 samples/sec   Loss 11.3728   LearningRate 0.0591   Epoch: 4   Global Step: 57420   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-27 07:07:47,433-Speed 3080.47 samples/sec   Loss 11.3328   LearningRate 0.0591   Epoch: 4   Global Step: 57430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:07:50,789-Speed 3051.72 samples/sec   Loss 11.3456   LearningRate 0.0591   Epoch: 4   Global Step: 57440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:07:54,198-Speed 3004.48 samples/sec   Loss 11.4288   LearningRate 0.0591   Epoch: 4   Global Step: 57450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:07:57,547-Speed 3058.62 samples/sec   Loss 11.3556   LearningRate 0.0591   Epoch: 4   Global Step: 57460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:00,927-Speed 3030.24 samples/sec   Loss 11.3628   LearningRate 0.0591   Epoch: 4   Global Step: 57470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:04,387-Speed 2960.51 samples/sec   Loss 11.3383   LearningRate 0.0591   Epoch: 4   Global Step: 57480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:07,768-Speed 3029.47 samples/sec   Loss 11.2078   LearningRate 0.0591   Epoch: 4   Global Step: 57490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:11,146-Speed 3032.13 samples/sec   Loss 11.1937   LearningRate 0.0591   Epoch: 4   Global Step: 57500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:14,494-Speed 3060.00 samples/sec   Loss 11.3654   LearningRate 0.0591   Epoch: 4   Global Step: 57510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:17,900-Speed 3006.98 samples/sec   Loss 11.5254   LearningRate 0.0591   Epoch: 4   Global Step: 57520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:21,266-Speed 3043.13 samples/sec   Loss 11.3117   LearningRate 0.0590   Epoch: 4   Global Step: 57530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:24,774-Speed 2919.48 samples/sec   Loss 11.4761   LearningRate 0.0590   Epoch: 4   Global Step: 57540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:28,171-Speed 3015.76 samples/sec   Loss 11.3420   LearningRate 0.0590   Epoch: 4   Global Step: 57550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:31,632-Speed 2958.72 samples/sec   Loss 11.3003   LearningRate 0.0590   Epoch: 4   Global Step: 57560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:35,051-Speed 2996.35 samples/sec   Loss 11.3748   LearningRate 0.0590   Epoch: 4   Global Step: 57570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:38,473-Speed 2993.19 samples/sec   Loss 11.4269   LearningRate 0.0590   Epoch: 4   Global Step: 57580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:41,826-Speed 3054.58 samples/sec   Loss 11.5106   LearningRate 0.0590   Epoch: 4   Global Step: 57590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:45,251-Speed 2990.75 samples/sec   Loss 11.4582   LearningRate 0.0590   Epoch: 4   Global Step: 57600   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:48,734-Speed 2941.10 samples/sec   Loss 11.3732   LearningRate 0.0590   Epoch: 4   Global Step: 57610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:52,240-Speed 2921.02 samples/sec   Loss 11.5324   LearningRate 0.0590   Epoch: 4   Global Step: 57620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:55,640-Speed 3012.92 samples/sec   Loss 11.3975   LearningRate 0.0590   Epoch: 4   Global Step: 57630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:08:59,033-Speed 3019.26 samples/sec   Loss 11.3027   LearningRate 0.0590   Epoch: 4   Global Step: 57640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:02,432-Speed 3013.48 samples/sec   Loss 11.3165   LearningRate 0.0590   Epoch: 4   Global Step: 57650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:05,958-Speed 2904.63 samples/sec   Loss 11.2775   LearningRate 0.0590   Epoch: 4   Global Step: 57660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:09,283-Speed 3080.70 samples/sec   Loss 11.2449   LearningRate 0.0590   Epoch: 4   Global Step: 57670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:12,708-Speed 2990.08 samples/sec   Loss 11.3460   LearningRate 0.0590   Epoch: 4   Global Step: 57680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:16,060-Speed 3055.85 samples/sec   Loss 11.3278   LearningRate 0.0589   Epoch: 4   Global Step: 57690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:19,481-Speed 2995.48 samples/sec   Loss 11.3832   LearningRate 0.0589   Epoch: 4   Global Step: 57700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:22,920-Speed 2978.85 samples/sec   Loss 11.3696   LearningRate 0.0589   Epoch: 4   Global Step: 57710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:26,363-Speed 2975.43 samples/sec   Loss 11.5166   LearningRate 0.0589   Epoch: 4   Global Step: 57720   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:29,707-Speed 3063.14 samples/sec   Loss 11.4014   LearningRate 0.0589   Epoch: 4   Global Step: 57730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:33,098-Speed 3020.69 samples/sec   Loss 11.3007   LearningRate 0.0589   Epoch: 4   Global Step: 57740   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:36,566-Speed 2952.80 samples/sec   Loss 11.3314   LearningRate 0.0589   Epoch: 4   Global Step: 57750   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:39,976-Speed 3004.56 samples/sec   Loss 11.2217   LearningRate 0.0589   Epoch: 4   Global Step: 57760   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:43,322-Speed 3060.85 samples/sec   Loss 11.2685   LearningRate 0.0589   Epoch: 4   Global Step: 57770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:46,723-Speed 3011.64 samples/sec   Loss 11.3583   LearningRate 0.0589   Epoch: 4   Global Step: 57780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:50,100-Speed 3033.32 samples/sec   Loss 11.3840   LearningRate 0.0589   Epoch: 4   Global Step: 57790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:53,592-Speed 2932.70 samples/sec   Loss 11.4444   LearningRate 0.0589   Epoch: 4   Global Step: 57800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:09:56,958-Speed 3043.62 samples/sec   Loss 11.2471   LearningRate 0.0589   Epoch: 4   Global Step: 57810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:10:00,317-Speed 3049.56 samples/sec   Loss 11.3253   LearningRate 0.0589   Epoch: 4   Global Step: 57820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:10:03,691-Speed 3036.28 samples/sec   Loss 11.3543   LearningRate 0.0589   Epoch: 4   Global Step: 57830   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-27 07:10:07,025-Speed 3072.18 samples/sec   Loss 11.2958   LearningRate 0.0589   Epoch: 4   Global Step: 57840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:10:10,378-Speed 3054.25 samples/sec   Loss 11.3952   LearningRate 0.0588   Epoch: 4   Global Step: 57850   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:10:13,748-Speed 3039.73 samples/sec   Loss 11.5260   LearningRate 0.0588   Epoch: 4   Global Step: 57860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:10:17,146-Speed 3013.68 samples/sec   Loss 11.3742   LearningRate 0.0588   Epoch: 4   Global Step: 57870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:10:20,519-Speed 3037.09 samples/sec   Loss 11.3554   LearningRate 0.0588   Epoch: 4   Global Step: 57880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:10:23,933-Speed 3000.16 samples/sec   Loss 11.3668   LearningRate 0.0588   Epoch: 4   Global Step: 57890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:10:27,341-Speed 3006.35 samples/sec   Loss 11.3275   LearningRate 0.0588   Epoch: 4   Global Step: 57900   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:10:30,776-Speed 2981.63 samples/sec   Loss 11.3532   LearningRate 0.0588   Epoch: 4   Global Step: 57910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:10:34,182-Speed 3007.57 samples/sec   Loss 11.2745   LearningRate 0.0588   Epoch: 4   Global Step: 57920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:10:37,543-Speed 3047.54 samples/sec   Loss 11.3975   LearningRate 0.0588   Epoch: 4   Global Step: 57930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:10:40,996-Speed 2966.30 samples/sec   Loss 11.1503   LearningRate 0.0588   Epoch: 4   Global Step: 57940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:10:44,421-Speed 2990.07 samples/sec   Loss 11.3549   LearningRate 0.0588   Epoch: 4   Global Step: 57950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:10:47,803-Speed 3029.02 samples/sec   Loss 11.3112   LearningRate 0.0588   Epoch: 4   Global Step: 57960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:10:51,161-Speed 3050.38 samples/sec   Loss 11.3954   LearningRate 0.0588   Epoch: 4   Global Step: 57970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:10:54,543-Speed 3028.68 samples/sec   Loss 11.5734   LearningRate 0.0588   Epoch: 4   Global Step: 57980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:10:57,922-Speed 3031.04 samples/sec   Loss 11.4223   LearningRate 0.0588   Epoch: 4   Global Step: 57990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:01,372-Speed 2969.83 samples/sec   Loss 11.4035   LearningRate 0.0588   Epoch: 4   Global Step: 58000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:04,742-Speed 3038.81 samples/sec   Loss 11.3843   LearningRate 0.0587   Epoch: 4   Global Step: 58010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:08,135-Speed 3018.71 samples/sec   Loss 11.2458   LearningRate 0.0587   Epoch: 4   Global Step: 58020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:11,478-Speed 3064.15 samples/sec   Loss 11.4664   LearningRate 0.0587   Epoch: 4   Global Step: 58030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:14,859-Speed 3030.06 samples/sec   Loss 11.4329   LearningRate 0.0587   Epoch: 4   Global Step: 58040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:18,226-Speed 3042.12 samples/sec   Loss 11.4566   LearningRate 0.0587   Epoch: 4   Global Step: 58050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:21,703-Speed 2946.07 samples/sec   Loss 11.3759   LearningRate 0.0587   Epoch: 4   Global Step: 58060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:25,084-Speed 3029.04 samples/sec   Loss 11.1654   LearningRate 0.0587   Epoch: 4   Global Step: 58070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:28,434-Speed 3057.77 samples/sec   Loss 11.3912   LearningRate 0.0587   Epoch: 4   Global Step: 58080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:31,789-Speed 3053.24 samples/sec   Loss 11.3301   LearningRate 0.0587   Epoch: 4   Global Step: 58090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:35,123-Speed 3072.37 samples/sec   Loss 11.3111   LearningRate 0.0587   Epoch: 4   Global Step: 58100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:38,484-Speed 3047.88 samples/sec   Loss 11.1888   LearningRate 0.0587   Epoch: 4   Global Step: 58110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:41,876-Speed 3019.98 samples/sec   Loss 11.4153   LearningRate 0.0587   Epoch: 4   Global Step: 58120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:45,325-Speed 2969.88 samples/sec   Loss 11.3556   LearningRate 0.0587   Epoch: 4   Global Step: 58130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:48,804-Speed 2943.62 samples/sec   Loss 11.3407   LearningRate 0.0587   Epoch: 4   Global Step: 58140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:52,274-Speed 2952.48 samples/sec   Loss 11.2912   LearningRate 0.0587   Epoch: 4   Global Step: 58150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:55,685-Speed 3003.09 samples/sec   Loss 11.2934   LearningRate 0.0587   Epoch: 4   Global Step: 58160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:11:59,117-Speed 2984.67 samples/sec   Loss 11.3598   LearningRate 0.0587   Epoch: 4   Global Step: 58170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:02,603-Speed 2938.33 samples/sec   Loss 11.3196   LearningRate 0.0586   Epoch: 4   Global Step: 58180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:06,018-Speed 2999.88 samples/sec   Loss 11.3887   LearningRate 0.0586   Epoch: 4   Global Step: 58190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:09,370-Speed 3054.80 samples/sec   Loss 11.2467   LearningRate 0.0586   Epoch: 4   Global Step: 58200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:12,732-Speed 3046.91 samples/sec   Loss 11.4070   LearningRate 0.0586   Epoch: 4   Global Step: 58210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:16,094-Speed 3046.57 samples/sec   Loss 11.2511   LearningRate 0.0586   Epoch: 4   Global Step: 58220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:19,554-Speed 2960.75 samples/sec   Loss 11.2783   LearningRate 0.0586   Epoch: 4   Global Step: 58230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:22,948-Speed 3018.49 samples/sec   Loss 11.3217   LearningRate 0.0586   Epoch: 4   Global Step: 58240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:26,302-Speed 3053.73 samples/sec   Loss 11.2783   LearningRate 0.0586   Epoch: 4   Global Step: 58250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:29,790-Speed 2936.57 samples/sec   Loss 11.2870   LearningRate 0.0586   Epoch: 4   Global Step: 58260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:33,111-Speed 3084.47 samples/sec   Loss 11.2994   LearningRate 0.0586   Epoch: 4   Global Step: 58270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:36,504-Speed 3018.39 samples/sec   Loss 11.3557   LearningRate 0.0586   Epoch: 4   Global Step: 58280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:39,825-Speed 3084.93 samples/sec   Loss 11.3997   LearningRate 0.0586   Epoch: 4   Global Step: 58290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:43,296-Speed 2950.63 samples/sec   Loss 11.1798   LearningRate 0.0586   Epoch: 4   Global Step: 58300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:46,744-Speed 2970.24 samples/sec   Loss 11.3447   LearningRate 0.0586   Epoch: 4   Global Step: 58310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:50,126-Speed 3029.39 samples/sec   Loss 11.2367   LearningRate 0.0586   Epoch: 4   Global Step: 58320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:53,533-Speed 3006.13 samples/sec   Loss 11.4663   LearningRate 0.0586   Epoch: 4   Global Step: 58330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:12:56,922-Speed 3022.53 samples/sec   Loss 11.1299   LearningRate 0.0585   Epoch: 4   Global Step: 58340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:00,247-Speed 3080.41 samples/sec   Loss 11.3052   LearningRate 0.0585   Epoch: 4   Global Step: 58350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:03,646-Speed 3013.87 samples/sec   Loss 11.1133   LearningRate 0.0585   Epoch: 4   Global Step: 58360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:07,111-Speed 2956.16 samples/sec   Loss 11.3770   LearningRate 0.0585   Epoch: 4   Global Step: 58370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:10,523-Speed 3001.89 samples/sec   Loss 11.2540   LearningRate 0.0585   Epoch: 4   Global Step: 58380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:13,897-Speed 3046.76 samples/sec   Loss 11.3728   LearningRate 0.0585   Epoch: 4   Global Step: 58390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:17,265-Speed 3042.03 samples/sec   Loss 11.3631   LearningRate 0.0585   Epoch: 4   Global Step: 58400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:20,666-Speed 3011.32 samples/sec   Loss 11.2826   LearningRate 0.0585   Epoch: 4   Global Step: 58410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:24,076-Speed 3004.18 samples/sec   Loss 11.1115   LearningRate 0.0585   Epoch: 4   Global Step: 58420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:27,517-Speed 2976.42 samples/sec   Loss 11.3421   LearningRate 0.0585   Epoch: 4   Global Step: 58430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:30,956-Speed 2977.99 samples/sec   Loss 11.4141   LearningRate 0.0585   Epoch: 4   Global Step: 58440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:34,386-Speed 2986.58 samples/sec   Loss 11.2073   LearningRate 0.0585   Epoch: 4   Global Step: 58450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:37,721-Speed 3071.37 samples/sec   Loss 11.4584   LearningRate 0.0585   Epoch: 4   Global Step: 58460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:41,089-Speed 3041.02 samples/sec   Loss 11.4017   LearningRate 0.0585   Epoch: 4   Global Step: 58470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:44,409-Speed 3086.39 samples/sec   Loss 11.2487   LearningRate 0.0585   Epoch: 4   Global Step: 58480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:47,723-Speed 3090.68 samples/sec   Loss 11.2098   LearningRate 0.0585   Epoch: 4   Global Step: 58490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:51,082-Speed 3049.05 samples/sec   Loss 11.1461   LearningRate 0.0584   Epoch: 4   Global Step: 58500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:13:54,502-Speed 2994.54 samples/sec   Loss 11.3112   LearningRate 0.0584   Epoch: 4   Global Step: 58510   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:13:57,864-Speed 3047.68 samples/sec   Loss 11.3976   LearningRate 0.0584   Epoch: 4   Global Step: 58520   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:14:01,256-Speed 3019.30 samples/sec   Loss 11.1774   LearningRate 0.0584   Epoch: 4   Global Step: 58530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:14:04,578-Speed 3083.99 samples/sec   Loss 11.2760   LearningRate 0.0584   Epoch: 4   Global Step: 58540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:14:07,932-Speed 3053.82 samples/sec   Loss 11.1727   LearningRate 0.0584   Epoch: 4   Global Step: 58550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:14:11,375-Speed 2975.30 samples/sec   Loss 11.2605   LearningRate 0.0584   Epoch: 4   Global Step: 58560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:14:14,742-Speed 3041.97 samples/sec   Loss 11.3613   LearningRate 0.0584   Epoch: 4   Global Step: 58570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:14:18,112-Speed 3039.36 samples/sec   Loss 11.2938   LearningRate 0.0584   Epoch: 4   Global Step: 58580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:14:21,576-Speed 2957.28 samples/sec   Loss 11.2315   LearningRate 0.0584   Epoch: 4   Global Step: 58590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:14:24,979-Speed 3009.51 samples/sec   Loss 11.3377   LearningRate 0.0584   Epoch: 4   Global Step: 58600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:14:28,369-Speed 3021.58 samples/sec   Loss 11.3645   LearningRate 0.0584   Epoch: 4   Global Step: 58610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:14:31,711-Speed 3065.18 samples/sec   Loss 11.3862   LearningRate 0.0584   Epoch: 4   Global Step: 58620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:14:35,053-Speed 3065.48 samples/sec   Loss 11.2631   LearningRate 0.0584   Epoch: 4   Global Step: 58630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:14:38,409-Speed 3052.27 samples/sec   Loss 11.3678   LearningRate 0.0584   Epoch: 4   Global Step: 58640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:14:41,798-Speed 3021.53 samples/sec   Loss 11.2307   LearningRate 0.0584   Epoch: 4   Global Step: 58650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:14:45,169-Speed 3039.40 samples/sec   Loss 11.3129   LearningRate 0.0583   Epoch: 4   Global Step: 58660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:14:48,629-Speed 2960.38 samples/sec   Loss 11.2227   LearningRate 0.0583   Epoch: 4   Global Step: 58670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:14:52,000-Speed 3039.08 samples/sec   Loss 11.4249   LearningRate 0.0583   Epoch: 4   Global Step: 58680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:14:55,407-Speed 3006.32 samples/sec   Loss 11.2967   LearningRate 0.0583   Epoch: 4   Global Step: 58690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:14:58,824-Speed 2998.20 samples/sec   Loss 11.2023   LearningRate 0.0583   Epoch: 4   Global Step: 58700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:02,337-Speed 2915.01 samples/sec   Loss 11.3233   LearningRate 0.0583   Epoch: 4   Global Step: 58710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:05,847-Speed 2918.46 samples/sec   Loss 11.1907   LearningRate 0.0583   Epoch: 4   Global Step: 58720   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:09,221-Speed 3035.90 samples/sec   Loss 11.2253   LearningRate 0.0583   Epoch: 4   Global Step: 58730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:12,632-Speed 3003.28 samples/sec   Loss 11.1547   LearningRate 0.0583   Epoch: 4   Global Step: 58740   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:15,957-Speed 3079.96 samples/sec   Loss 11.2474   LearningRate 0.0583   Epoch: 4   Global Step: 58750   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:19,337-Speed 3030.77 samples/sec   Loss 11.3071   LearningRate 0.0583   Epoch: 4   Global Step: 58760   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:22,686-Speed 3058.94 samples/sec   Loss 11.2487   LearningRate 0.0583   Epoch: 4   Global Step: 58770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:26,020-Speed 3071.93 samples/sec   Loss 11.2405   LearningRate 0.0583   Epoch: 4   Global Step: 58780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:29,356-Speed 3070.65 samples/sec   Loss 11.4408   LearningRate 0.0583   Epoch: 4   Global Step: 58790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:32,697-Speed 3066.85 samples/sec   Loss 11.1309   LearningRate 0.0583   Epoch: 4   Global Step: 58800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:35,995-Speed 3106.32 samples/sec   Loss 11.2843   LearningRate 0.0583   Epoch: 4   Global Step: 58810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:39,353-Speed 3050.36 samples/sec   Loss 11.3581   LearningRate 0.0583   Epoch: 4   Global Step: 58820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:42,820-Speed 2954.22 samples/sec   Loss 11.2335   LearningRate 0.0582   Epoch: 4   Global Step: 58830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:46,154-Speed 3072.53 samples/sec   Loss 11.5452   LearningRate 0.0582   Epoch: 4   Global Step: 58840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:49,558-Speed 3008.78 samples/sec   Loss 11.1623   LearningRate 0.0582   Epoch: 4   Global Step: 58850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:52,986-Speed 2987.78 samples/sec   Loss 11.1400   LearningRate 0.0582   Epoch: 4   Global Step: 58860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:56,324-Speed 3069.42 samples/sec   Loss 11.1985   LearningRate 0.0582   Epoch: 4   Global Step: 58870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:15:59,784-Speed 2960.12 samples/sec   Loss 11.3073   LearningRate 0.0582   Epoch: 4   Global Step: 58880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:03,159-Speed 3035.79 samples/sec   Loss 11.2745   LearningRate 0.0582   Epoch: 4   Global Step: 58890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:06,520-Speed 3047.46 samples/sec   Loss 11.2564   LearningRate 0.0582   Epoch: 4   Global Step: 58900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:09,891-Speed 3038.15 samples/sec   Loss 11.2157   LearningRate 0.0582   Epoch: 4   Global Step: 58910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:13,249-Speed 3050.91 samples/sec   Loss 11.2782   LearningRate 0.0582   Epoch: 4   Global Step: 58920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:16,556-Speed 3097.82 samples/sec   Loss 11.3713   LearningRate 0.0582   Epoch: 4   Global Step: 58930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:19,874-Speed 3087.11 samples/sec   Loss 11.3469   LearningRate 0.0582   Epoch: 4   Global Step: 58940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:23,210-Speed 3069.97 samples/sec   Loss 11.2098   LearningRate 0.0582   Epoch: 4   Global Step: 58950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:26,538-Speed 3078.38 samples/sec   Loss 11.2550   LearningRate 0.0582   Epoch: 4   Global Step: 58960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:29,924-Speed 3024.78 samples/sec   Loss 11.3209   LearningRate 0.0582   Epoch: 4   Global Step: 58970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:33,323-Speed 3013.17 samples/sec   Loss 11.1863   LearningRate 0.0582   Epoch: 4   Global Step: 58980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:36,711-Speed 3024.25 samples/sec   Loss 11.2248   LearningRate 0.0581   Epoch: 4   Global Step: 58990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:40,107-Speed 3015.41 samples/sec   Loss 11.2542   LearningRate 0.0581   Epoch: 4   Global Step: 59000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:43,535-Speed 2988.73 samples/sec   Loss 11.3270   LearningRate 0.0581   Epoch: 4   Global Step: 59010   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-27 07:16:46,870-Speed 3071.89 samples/sec   Loss 11.0734   LearningRate 0.0581   Epoch: 4   Global Step: 59020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:50,263-Speed 3017.76 samples/sec   Loss 11.2173   LearningRate 0.0581   Epoch: 4   Global Step: 59030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:53,622-Speed 3050.13 samples/sec   Loss 11.2629   LearningRate 0.0581   Epoch: 4   Global Step: 59040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:16:57,006-Speed 3026.86 samples/sec   Loss 11.2636   LearningRate 0.0581   Epoch: 4   Global Step: 59050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:00,435-Speed 2987.55 samples/sec   Loss 11.2209   LearningRate 0.0581   Epoch: 4   Global Step: 59060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:03,851-Speed 2998.28 samples/sec   Loss 11.2626   LearningRate 0.0581   Epoch: 4   Global Step: 59070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:07,226-Speed 3034.93 samples/sec   Loss 11.2610   LearningRate 0.0581   Epoch: 4   Global Step: 59080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:10,560-Speed 3071.50 samples/sec   Loss 11.3697   LearningRate 0.0581   Epoch: 4   Global Step: 59090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:13,950-Speed 3022.23 samples/sec   Loss 11.3169   LearningRate 0.0581   Epoch: 4   Global Step: 59100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:17,291-Speed 3065.56 samples/sec   Loss 11.1991   LearningRate 0.0581   Epoch: 4   Global Step: 59110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:20,663-Speed 3037.77 samples/sec   Loss 11.1506   LearningRate 0.0581   Epoch: 4   Global Step: 59120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:24,044-Speed 3028.82 samples/sec   Loss 11.1910   LearningRate 0.0581   Epoch: 4   Global Step: 59130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:27,388-Speed 3063.48 samples/sec   Loss 11.1781   LearningRate 0.0581   Epoch: 4   Global Step: 59140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:30,855-Speed 2954.66 samples/sec   Loss 11.2046   LearningRate 0.0580   Epoch: 4   Global Step: 59150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:34,204-Speed 3058.08 samples/sec   Loss 11.4008   LearningRate 0.0580   Epoch: 4   Global Step: 59160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:37,571-Speed 3042.73 samples/sec   Loss 11.3308   LearningRate 0.0580   Epoch: 4   Global Step: 59170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:41,004-Speed 2983.01 samples/sec   Loss 11.1938   LearningRate 0.0580   Epoch: 4   Global Step: 59180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:44,418-Speed 2999.97 samples/sec   Loss 11.2794   LearningRate 0.0580   Epoch: 4   Global Step: 59190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:47,791-Speed 3037.66 samples/sec   Loss 11.3349   LearningRate 0.0580   Epoch: 4   Global Step: 59200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:51,125-Speed 3071.89 samples/sec   Loss 11.4177   LearningRate 0.0580   Epoch: 4   Global Step: 59210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:17:54,484-Speed 3048.84 samples/sec   Loss 11.1589   LearningRate 0.0580   Epoch: 4   Global Step: 59220   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-27 07:17:57,901-Speed 2998.09 samples/sec   Loss 11.4366   LearningRate 0.0580   Epoch: 4   Global Step: 59230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:18:01,259-Speed 3049.64 samples/sec   Loss 11.2909   LearningRate 0.0580   Epoch: 4   Global Step: 59240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:18:04,611-Speed 3055.86 samples/sec   Loss 11.1808   LearningRate 0.0580   Epoch: 4   Global Step: 59250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:18:07,957-Speed 3062.22 samples/sec   Loss 11.2847   LearningRate 0.0580   Epoch: 4   Global Step: 59260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:18:11,395-Speed 2979.52 samples/sec   Loss 11.0426   LearningRate 0.0580   Epoch: 4   Global Step: 59270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:18:14,804-Speed 3004.58 samples/sec   Loss 11.2825   LearningRate 0.0580   Epoch: 4   Global Step: 59280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:18:18,176-Speed 3037.51 samples/sec   Loss 11.3475   LearningRate 0.0580   Epoch: 4   Global Step: 59290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:18:21,557-Speed 3029.28 samples/sec   Loss 11.2898   LearningRate 0.0580   Epoch: 4   Global Step: 59300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:18:24,989-Speed 2984.51 samples/sec   Loss 11.2033   LearningRate 0.0580   Epoch: 4   Global Step: 59310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:18:28,348-Speed 3049.79 samples/sec   Loss 11.1765   LearningRate 0.0579   Epoch: 4   Global Step: 59320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:18:31,671-Speed 3082.60 samples/sec   Loss 11.2205   LearningRate 0.0579   Epoch: 4   Global Step: 59330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:18:35,025-Speed 3053.75 samples/sec   Loss 11.2036   LearningRate 0.0579   Epoch: 4   Global Step: 59340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:18:38,375-Speed 3057.24 samples/sec   Loss 11.1290   LearningRate 0.0579   Epoch: 4   Global Step: 59350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:18:41,736-Speed 3048.28 samples/sec   Loss 11.2802   LearningRate 0.0579   Epoch: 4   Global Step: 59360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:18:45,059-Speed 3082.47 samples/sec   Loss 11.2414   LearningRate 0.0579   Epoch: 4   Global Step: 59370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:18:48,422-Speed 3045.60 samples/sec   Loss 11.2938   LearningRate 0.0579   Epoch: 4   Global Step: 59380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:18:51,762-Speed 3066.07 samples/sec   Loss 11.2341   LearningRate 0.0579   Epoch: 4   Global Step: 59390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:18:55,168-Speed 3007.60 samples/sec   Loss 11.2467   LearningRate 0.0579   Epoch: 4   Global Step: 59400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:18:58,598-Speed 2986.29 samples/sec   Loss 11.1551   LearningRate 0.0579   Epoch: 4   Global Step: 59410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:19:01,999-Speed 3012.17 samples/sec   Loss 10.9473   LearningRate 0.0579   Epoch: 4   Global Step: 59420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:19:05,412-Speed 3000.97 samples/sec   Loss 11.2783   LearningRate 0.0579   Epoch: 4   Global Step: 59430   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:19:08,776-Speed 3044.88 samples/sec   Loss 11.2068   LearningRate 0.0579   Epoch: 4   Global Step: 59440   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:19:12,126-Speed 3057.82 samples/sec   Loss 11.3180   LearningRate 0.0579   Epoch: 4   Global Step: 59450   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:19:15,603-Speed 2945.89 samples/sec   Loss 11.1940   LearningRate 0.0579   Epoch: 4   Global Step: 59460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:19:19,102-Speed 2927.32 samples/sec   Loss 11.3416   LearningRate 0.0579   Epoch: 4   Global Step: 59470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:19:22,526-Speed 2991.18 samples/sec   Loss 11.1437   LearningRate 0.0578   Epoch: 4   Global Step: 59480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:19:25,872-Speed 3061.36 samples/sec   Loss 11.2616   LearningRate 0.0578   Epoch: 4   Global Step: 59490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:19:29,330-Speed 2961.50 samples/sec   Loss 11.2547   LearningRate 0.0578   Epoch: 4   Global Step: 59500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:19:32,752-Speed 2993.90 samples/sec   Loss 11.2030   LearningRate 0.0578   Epoch: 4   Global Step: 59510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:19:36,111-Speed 3049.41 samples/sec   Loss 11.2438   LearningRate 0.0578   Epoch: 4   Global Step: 59520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:19:39,537-Speed 2988.93 samples/sec   Loss 11.1023   LearningRate 0.0578   Epoch: 4   Global Step: 59530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:19:42,968-Speed 2985.99 samples/sec   Loss 11.1739   LearningRate 0.0578   Epoch: 4   Global Step: 59540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:19:46,337-Speed 3040.05 samples/sec   Loss 11.1926   LearningRate 0.0578   Epoch: 4   Global Step: 59550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:19:49,688-Speed 3056.30 samples/sec   Loss 11.1892   LearningRate 0.0578   Epoch: 4   Global Step: 59560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:19:53,057-Speed 3040.68 samples/sec   Loss 11.2705   LearningRate 0.0578   Epoch: 4   Global Step: 59570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:19:56,462-Speed 3008.21 samples/sec   Loss 11.2677   LearningRate 0.0578   Epoch: 4   Global Step: 59580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:19:59,836-Speed 3035.57 samples/sec   Loss 11.2732   LearningRate 0.0578   Epoch: 4   Global Step: 59590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:20:03,245-Speed 3004.56 samples/sec   Loss 11.2545   LearningRate 0.0578   Epoch: 4   Global Step: 59600   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:20:06,591-Speed 3061.65 samples/sec   Loss 11.0794   LearningRate 0.0578   Epoch: 4   Global Step: 59610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:20:10,012-Speed 2993.64 samples/sec   Loss 11.1802   LearningRate 0.0578   Epoch: 4   Global Step: 59620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:20:13,419-Speed 3006.71 samples/sec   Loss 11.1128   LearningRate 0.0578   Epoch: 4   Global Step: 59630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:20:16,781-Speed 3046.20 samples/sec   Loss 11.2447   LearningRate 0.0577   Epoch: 4   Global Step: 59640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:20:20,110-Speed 3076.97 samples/sec   Loss 11.4589   LearningRate 0.0577   Epoch: 4   Global Step: 59650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:20:23,536-Speed 2989.80 samples/sec   Loss 11.3408   LearningRate 0.0577   Epoch: 4   Global Step: 59660   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-27 07:20:26,898-Speed 3046.59 samples/sec   Loss 11.3307   LearningRate 0.0577   Epoch: 4   Global Step: 59670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:20:30,312-Speed 3000.55 samples/sec   Loss 11.1660   LearningRate 0.0577   Epoch: 4   Global Step: 59680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:20:33,689-Speed 3033.16 samples/sec   Loss 11.2469   LearningRate 0.0577   Epoch: 4   Global Step: 59690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:20:36,997-Speed 3095.97 samples/sec   Loss 11.2483   LearningRate 0.0577   Epoch: 4   Global Step: 59700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:20:40,373-Speed 3033.80 samples/sec   Loss 11.1691   LearningRate 0.0577   Epoch: 4   Global Step: 59710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:20:43,772-Speed 3013.69 samples/sec   Loss 11.2857   LearningRate 0.0577   Epoch: 4   Global Step: 59720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:20:47,125-Speed 3054.99 samples/sec   Loss 11.1574   LearningRate 0.0577   Epoch: 4   Global Step: 59730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:20:50,462-Speed 3069.18 samples/sec   Loss 11.2628   LearningRate 0.0577   Epoch: 4   Global Step: 59740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:20:53,834-Speed 3038.20 samples/sec   Loss 11.2246   LearningRate 0.0577   Epoch: 4   Global Step: 59750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:20:57,164-Speed 3075.69 samples/sec   Loss 11.2061   LearningRate 0.0577   Epoch: 4   Global Step: 59760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:21:00,545-Speed 3029.50 samples/sec   Loss 11.4423   LearningRate 0.0577   Epoch: 4   Global Step: 59770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:21:03,947-Speed 3010.63 samples/sec   Loss 11.1429   LearningRate 0.0577   Epoch: 4   Global Step: 59780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:21:07,335-Speed 3023.08 samples/sec   Loss 11.2141   LearningRate 0.0577   Epoch: 4   Global Step: 59790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:21:10,669-Speed 3072.94 samples/sec   Loss 11.1005   LearningRate 0.0577   Epoch: 4   Global Step: 59800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:21:14,008-Speed 3067.65 samples/sec   Loss 11.4540   LearningRate 0.0576   Epoch: 4   Global Step: 59810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:21:17,376-Speed 3041.92 samples/sec   Loss 11.3061   LearningRate 0.0576   Epoch: 4   Global Step: 59820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:21:20,756-Speed 3030.37 samples/sec   Loss 11.1445   LearningRate 0.0576   Epoch: 4   Global Step: 59830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:21:24,149-Speed 3019.10 samples/sec   Loss 11.0281   LearningRate 0.0576   Epoch: 4   Global Step: 59840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:21:27,534-Speed 3026.01 samples/sec   Loss 11.0818   LearningRate 0.0576   Epoch: 4   Global Step: 59850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:21:30,952-Speed 2996.86 samples/sec   Loss 11.1424   LearningRate 0.0576   Epoch: 4   Global Step: 59860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:21:34,400-Speed 2970.41 samples/sec   Loss 11.3719   LearningRate 0.0576   Epoch: 4   Global Step: 59870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:21:37,884-Speed 2939.70 samples/sec   Loss 11.1811   LearningRate 0.0576   Epoch: 4   Global Step: 59880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:21:41,353-Speed 2953.40 samples/sec   Loss 11.2454   LearningRate 0.0576   Epoch: 4   Global Step: 59890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:21:44,781-Speed 2987.79 samples/sec   Loss 11.2274   LearningRate 0.0576   Epoch: 4   Global Step: 59900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:21:48,169-Speed 3023.08 samples/sec   Loss 11.2519   LearningRate 0.0576   Epoch: 4   Global Step: 59910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:21:51,536-Speed 3042.36 samples/sec   Loss 11.1479   LearningRate 0.0576   Epoch: 4   Global Step: 59920   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-04-27 07:21:54,938-Speed 3010.49 samples/sec   Loss 11.1108   LearningRate 0.0576   Epoch: 4   Global Step: 59930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:21:58,374-Speed 2982.24 samples/sec   Loss 11.2713   LearningRate 0.0576   Epoch: 4   Global Step: 59940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:22:01,823-Speed 2970.50 samples/sec   Loss 11.2870   LearningRate 0.0576   Epoch: 4   Global Step: 59950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:22:05,184-Speed 3046.76 samples/sec   Loss 11.2219   LearningRate 0.0576   Epoch: 4   Global Step: 59960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:22:08,613-Speed 2987.69 samples/sec   Loss 11.2985   LearningRate 0.0575   Epoch: 4   Global Step: 59970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:22:11,986-Speed 3037.04 samples/sec   Loss 11.1085   LearningRate 0.0575   Epoch: 4   Global Step: 59980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:22:15,373-Speed 3023.65 samples/sec   Loss 11.2134   LearningRate 0.0575   Epoch: 4   Global Step: 59990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:22:18,702-Speed 3077.20 samples/sec   Loss 11.1727   LearningRate 0.0575   Epoch: 4   Global Step: 60000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:22:22,070-Speed 3041.66 samples/sec   Loss 11.1138   LearningRate 0.0575   Epoch: 4   Global Step: 60010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:22:25,499-Speed 2987.00 samples/sec   Loss 11.1712   LearningRate 0.0575   Epoch: 4   Global Step: 60020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:22:28,862-Speed 3044.92 samples/sec   Loss 11.2647   LearningRate 0.0575   Epoch: 4   Global Step: 60030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:22:32,310-Speed 2971.22 samples/sec   Loss 11.1341   LearningRate 0.0575   Epoch: 4   Global Step: 60040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:22:35,705-Speed 3016.81 samples/sec   Loss 11.2162   LearningRate 0.0575   Epoch: 4   Global Step: 60050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:22:39,080-Speed 3034.62 samples/sec   Loss 11.1132   LearningRate 0.0575   Epoch: 4   Global Step: 60060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:22:42,442-Speed 3047.51 samples/sec   Loss 11.2969   LearningRate 0.0575   Epoch: 4   Global Step: 60070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:22:45,899-Speed 2962.48 samples/sec   Loss 11.2354   LearningRate 0.0575   Epoch: 4   Global Step: 60080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:22:49,374-Speed 2947.94 samples/sec   Loss 11.2254   LearningRate 0.0575   Epoch: 4   Global Step: 60090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:22:52,675-Speed 3102.97 samples/sec   Loss 11.0932   LearningRate 0.0575   Epoch: 4   Global Step: 60100   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:22:56,061-Speed 3025.13 samples/sec   Loss 11.1883   LearningRate 0.0575   Epoch: 4   Global Step: 60110   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:22:59,416-Speed 3053.13 samples/sec   Loss 11.2048   LearningRate 0.0575   Epoch: 4   Global Step: 60120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:23:02,832-Speed 2998.23 samples/sec   Loss 11.2620   LearningRate 0.0574   Epoch: 4   Global Step: 60130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:23:06,261-Speed 2987.47 samples/sec   Loss 11.3082   LearningRate 0.0574   Epoch: 4   Global Step: 60140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:23:09,739-Speed 2945.01 samples/sec   Loss 11.2842   LearningRate 0.0574   Epoch: 4   Global Step: 60150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:23:13,102-Speed 3045.65 samples/sec   Loss 11.1924   LearningRate 0.0574   Epoch: 4   Global Step: 60160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:23:16,475-Speed 3037.06 samples/sec   Loss 11.2080   LearningRate 0.0574   Epoch: 4   Global Step: 60170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:23:19,883-Speed 3005.77 samples/sec   Loss 10.8748   LearningRate 0.0574   Epoch: 4   Global Step: 60180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:23:23,208-Speed 3080.16 samples/sec   Loss 11.2766   LearningRate 0.0574   Epoch: 4   Global Step: 60190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:23:26,552-Speed 3063.17 samples/sec   Loss 11.2931   LearningRate 0.0574   Epoch: 4   Global Step: 60200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:23:29,989-Speed 2980.10 samples/sec   Loss 11.2317   LearningRate 0.0574   Epoch: 4   Global Step: 60210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:23:33,460-Speed 2950.80 samples/sec   Loss 11.1386   LearningRate 0.0574   Epoch: 4   Global Step: 60220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:23:36,885-Speed 2990.70 samples/sec   Loss 11.2783   LearningRate 0.0574   Epoch: 4   Global Step: 60230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:23:40,264-Speed 3030.80 samples/sec   Loss 11.1524   LearningRate 0.0574   Epoch: 4   Global Step: 60240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:23:43,626-Speed 3047.27 samples/sec   Loss 11.1620   LearningRate 0.0574   Epoch: 4   Global Step: 60250   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:23:47,035-Speed 3004.20 samples/sec   Loss 11.3498   LearningRate 0.0574   Epoch: 4   Global Step: 60260   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:23:50,413-Speed 3032.14 samples/sec   Loss 11.0794   LearningRate 0.0574   Epoch: 4   Global Step: 60270   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:23:53,748-Speed 3071.66 samples/sec   Loss 11.0537   LearningRate 0.0574   Epoch: 4   Global Step: 60280   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:23:57,104-Speed 3052.00 samples/sec   Loss 11.2670   LearningRate 0.0574   Epoch: 4   Global Step: 60290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:24:00,539-Speed 2982.53 samples/sec   Loss 11.0650   LearningRate 0.0573   Epoch: 4   Global Step: 60300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:24:03,898-Speed 3049.01 samples/sec   Loss 11.2761   LearningRate 0.0573   Epoch: 4   Global Step: 60310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:24:07,236-Speed 3068.99 samples/sec   Loss 11.1370   LearningRate 0.0573   Epoch: 4   Global Step: 60320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:24:10,614-Speed 3031.98 samples/sec   Loss 11.0291   LearningRate 0.0573   Epoch: 4   Global Step: 60330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:24:14,013-Speed 3013.46 samples/sec   Loss 11.1805   LearningRate 0.0573   Epoch: 4   Global Step: 60340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:24:17,403-Speed 3021.52 samples/sec   Loss 11.1115   LearningRate 0.0573   Epoch: 4   Global Step: 60350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:24:20,816-Speed 3001.87 samples/sec   Loss 11.2587   LearningRate 0.0573   Epoch: 4   Global Step: 60360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:24:24,281-Speed 2956.01 samples/sec   Loss 11.0560   LearningRate 0.0573   Epoch: 4   Global Step: 60370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:24:27,767-Speed 2938.12 samples/sec   Loss 11.1540   LearningRate 0.0573   Epoch: 4   Global Step: 60380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:24:31,195-Speed 2988.39 samples/sec   Loss 11.1942   LearningRate 0.0573   Epoch: 4   Global Step: 60390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:24:34,599-Speed 3008.45 samples/sec   Loss 11.0612   LearningRate 0.0573   Epoch: 4   Global Step: 60400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:24:37,990-Speed 3020.56 samples/sec   Loss 11.0661   LearningRate 0.0573   Epoch: 4   Global Step: 60410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:24:41,407-Speed 2997.67 samples/sec   Loss 11.1319   LearningRate 0.0573   Epoch: 4   Global Step: 60420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:24:44,823-Speed 2998.29 samples/sec   Loss 11.2434   LearningRate 0.0573   Epoch: 4   Global Step: 60430   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:24:48,161-Speed 3068.59 samples/sec   Loss 11.0640   LearningRate 0.0573   Epoch: 4   Global Step: 60440   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:24:51,622-Speed 2959.83 samples/sec   Loss 11.3050   LearningRate 0.0573   Epoch: 4   Global Step: 60450   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:24:54,972-Speed 3056.97 samples/sec   Loss 11.3213   LearningRate 0.0572   Epoch: 4   Global Step: 60460   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:24:58,429-Speed 2963.18 samples/sec   Loss 11.3371   LearningRate 0.0572   Epoch: 4   Global Step: 60470   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:25:01,877-Speed 2970.77 samples/sec   Loss 11.2885   LearningRate 0.0572   Epoch: 4   Global Step: 60480   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:25:05,316-Speed 2978.84 samples/sec   Loss 11.2164   LearningRate 0.0572   Epoch: 4   Global Step: 60490   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:25:08,743-Speed 2988.41 samples/sec   Loss 11.1755   LearningRate 0.0572   Epoch: 4   Global Step: 60500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:25:12,096-Speed 3055.01 samples/sec   Loss 11.1525   LearningRate 0.0572   Epoch: 4   Global Step: 60510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:25:15,499-Speed 3009.79 samples/sec   Loss 11.1620   LearningRate 0.0572   Epoch: 4   Global Step: 60520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:25:18,916-Speed 2997.33 samples/sec   Loss 11.0883   LearningRate 0.0572   Epoch: 4   Global Step: 60530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:25:22,355-Speed 2979.03 samples/sec   Loss 11.1982   LearningRate 0.0572   Epoch: 4   Global Step: 60540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:25:25,759-Speed 3008.27 samples/sec   Loss 11.2685   LearningRate 0.0572   Epoch: 4   Global Step: 60550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:25:29,170-Speed 3003.71 samples/sec   Loss 11.1812   LearningRate 0.0572   Epoch: 4   Global Step: 60560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:25:32,516-Speed 3061.10 samples/sec   Loss 11.1234   LearningRate 0.0572   Epoch: 4   Global Step: 60570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:25:35,847-Speed 3074.55 samples/sec   Loss 11.1073   LearningRate 0.0572   Epoch: 4   Global Step: 60580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:25:39,231-Speed 3027.42 samples/sec   Loss 11.1933   LearningRate 0.0572   Epoch: 4   Global Step: 60590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:25:42,703-Speed 2950.14 samples/sec   Loss 11.2736   LearningRate 0.0572   Epoch: 4   Global Step: 60600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:25:46,066-Speed 3045.61 samples/sec   Loss 11.3268   LearningRate 0.0572   Epoch: 4   Global Step: 60610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:25:49,446-Speed 3029.83 samples/sec   Loss 11.0463   LearningRate 0.0572   Epoch: 4   Global Step: 60620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:25:52,767-Speed 3084.66 samples/sec   Loss 11.2316   LearningRate 0.0571   Epoch: 4   Global Step: 60630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:25:56,090-Speed 3081.90 samples/sec   Loss 11.1173   LearningRate 0.0571   Epoch: 4   Global Step: 60640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:25:59,477-Speed 3024.10 samples/sec   Loss 11.2336   LearningRate 0.0571   Epoch: 4   Global Step: 60650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:26:02,806-Speed 3077.13 samples/sec   Loss 11.1490   LearningRate 0.0571   Epoch: 4   Global Step: 60660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:26:06,131-Speed 3080.85 samples/sec   Loss 11.0104   LearningRate 0.0571   Epoch: 4   Global Step: 60670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:26:09,438-Speed 3098.00 samples/sec   Loss 11.1834   LearningRate 0.0571   Epoch: 4   Global Step: 60680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:26:12,765-Speed 3078.52 samples/sec   Loss 11.1156   LearningRate 0.0571   Epoch: 4   Global Step: 60690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:26:16,081-Speed 3089.13 samples/sec   Loss 11.2682   LearningRate 0.0571   Epoch: 4   Global Step: 60700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:26:19,416-Speed 3070.82 samples/sec   Loss 11.1588   LearningRate 0.0571   Epoch: 4   Global Step: 60710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:26:22,792-Speed 3033.50 samples/sec   Loss 11.1742   LearningRate 0.0571   Epoch: 4   Global Step: 60720   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:26:26,160-Speed 3041.25 samples/sec   Loss 11.1945   LearningRate 0.0571   Epoch: 4   Global Step: 60730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:26:29,514-Speed 3054.48 samples/sec   Loss 10.9830   LearningRate 0.0571   Epoch: 4   Global Step: 60740   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:26:32,853-Speed 3067.23 samples/sec   Loss 11.1539   LearningRate 0.0571   Epoch: 4   Global Step: 60750   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:26:36,188-Speed 3071.43 samples/sec   Loss 11.0890   LearningRate 0.0571   Epoch: 4   Global Step: 60760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:26:39,580-Speed 3019.34 samples/sec   Loss 11.1861   LearningRate 0.0571   Epoch: 4   Global Step: 60770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:26:42,889-Speed 3096.15 samples/sec   Loss 11.0513   LearningRate 0.0571   Epoch: 4   Global Step: 60780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:26:46,279-Speed 3021.35 samples/sec   Loss 11.1671   LearningRate 0.0570   Epoch: 4   Global Step: 60790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:26:49,670-Speed 3021.28 samples/sec   Loss 11.1419   LearningRate 0.0570   Epoch: 4   Global Step: 60800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:26:53,052-Speed 3028.53 samples/sec   Loss 11.1774   LearningRate 0.0570   Epoch: 4   Global Step: 60810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:26:56,451-Speed 3013.49 samples/sec   Loss 11.1326   LearningRate 0.0570   Epoch: 4   Global Step: 60820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:26:59,863-Speed 3001.89 samples/sec   Loss 11.1420   LearningRate 0.0570   Epoch: 4   Global Step: 60830   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:27:03,194-Speed 3075.30 samples/sec   Loss 11.0427   LearningRate 0.0570   Epoch: 4   Global Step: 60840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:27:06,568-Speed 3036.18 samples/sec   Loss 11.0559   LearningRate 0.0570   Epoch: 4   Global Step: 60850   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:27:09,969-Speed 3011.18 samples/sec   Loss 11.1904   LearningRate 0.0570   Epoch: 4   Global Step: 60860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:27:13,282-Speed 3091.67 samples/sec   Loss 11.0762   LearningRate 0.0570   Epoch: 4   Global Step: 60870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:27:16,625-Speed 3064.03 samples/sec   Loss 11.2240   LearningRate 0.0570   Epoch: 4   Global Step: 60880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:27:20,020-Speed 3017.78 samples/sec   Loss 11.1150   LearningRate 0.0570   Epoch: 4   Global Step: 60890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:27:23,377-Speed 3050.42 samples/sec   Loss 11.1219   LearningRate 0.0570   Epoch: 4   Global Step: 60900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:27:26,790-Speed 3001.60 samples/sec   Loss 11.0545   LearningRate 0.0570   Epoch: 4   Global Step: 60910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:27:30,174-Speed 3026.33 samples/sec   Loss 10.9691   LearningRate 0.0570   Epoch: 4   Global Step: 60920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:27:33,548-Speed 3036.24 samples/sec   Loss 11.1477   LearningRate 0.0570   Epoch: 4   Global Step: 60930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 07:27:36,865-Speed 3088.25 samples/sec   Loss 11.1679   LearningRate 0.0570   Epoch: 4   Global Step: 60940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:27:40,203-Speed 3069.41 samples/sec   Loss 11.1470   LearningRate 0.0569   Epoch: 4   Global Step: 60950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:27:43,593-Speed 3021.51 samples/sec   Loss 10.9971   LearningRate 0.0569   Epoch: 4   Global Step: 60960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:27:47,002-Speed 3004.55 samples/sec   Loss 11.1323   LearningRate 0.0569   Epoch: 4   Global Step: 60970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:27:50,408-Speed 3007.91 samples/sec   Loss 11.0824   LearningRate 0.0569   Epoch: 4   Global Step: 60980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 07:27:53,848-Speed 2977.95 samples/sec   Loss 11.1218   LearningRate 0.0569   Epoch: 4   Global Step: 60990   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 07:27:57,229-Speed 3030.94 samples/sec   Loss 11.1417   LearningRate 0.0569   Epoch: 4   Global Step: 61000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 07:28:00,586-Speed 3051.08 samples/sec   Loss 11.0456   LearningRate 0.0569   Epoch: 4   Global Step: 61010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 07:28:04,006-Speed 2995.53 samples/sec   Loss 11.1144   LearningRate 0.0569   Epoch: 4   Global Step: 61020   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 07:28:07,360-Speed 3053.29 samples/sec   Loss 10.9617   LearningRate 0.0569   Epoch: 4   Global Step: 61030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 07:28:10,682-Speed 3083.52 samples/sec   Loss 11.1223   LearningRate 0.0569   Epoch: 4   Global Step: 61040   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 07:28:14,092-Speed 3003.72 samples/sec   Loss 11.1110   LearningRate 0.0569   Epoch: 4   Global Step: 61050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 07:28:17,460-Speed 3041.03 samples/sec   Loss 10.9734   LearningRate 0.0569   Epoch: 4   Global Step: 61060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 07:28:20,893-Speed 2983.81 samples/sec   Loss 11.1297   LearningRate 0.0569   Epoch: 4   Global Step: 61070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 07:28:24,240-Speed 3060.49 samples/sec   Loss 11.1981   LearningRate 0.0569   Epoch: 4   Global Step: 61080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:28:27,659-Speed 2995.72 samples/sec   Loss 11.2074   LearningRate 0.0569   Epoch: 4   Global Step: 61090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:28:31,079-Speed 2995.56 samples/sec   Loss 11.1309   LearningRate 0.0569   Epoch: 4   Global Step: 61100   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:28:34,481-Speed 3010.51 samples/sec   Loss 11.0526   LearningRate 0.0569   Epoch: 4   Global Step: 61110   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:28:37,884-Speed 3010.34 samples/sec   Loss 11.1642   LearningRate 0.0568   Epoch: 4   Global Step: 61120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 07:28:41,267-Speed 3028.14 samples/sec   Loss 11.0651   LearningRate 0.0568   Epoch: 4   Global Step: 61130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:28:44,611-Speed 3062.82 samples/sec   Loss 11.1802   LearningRate 0.0568   Epoch: 4   Global Step: 61140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:28:47,986-Speed 3035.16 samples/sec   Loss 11.2556   LearningRate 0.0568   Epoch: 4   Global Step: 61150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:28:51,370-Speed 3027.10 samples/sec   Loss 11.1775   LearningRate 0.0568   Epoch: 4   Global Step: 61160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:28:54,784-Speed 3000.00 samples/sec   Loss 10.9830   LearningRate 0.0568   Epoch: 4   Global Step: 61170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:28:58,205-Speed 2994.21 samples/sec   Loss 11.0995   LearningRate 0.0568   Epoch: 4   Global Step: 61180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:29:01,576-Speed 3038.85 samples/sec   Loss 11.1942   LearningRate 0.0568   Epoch: 4   Global Step: 61190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:29:05,007-Speed 2985.05 samples/sec   Loss 11.0662   LearningRate 0.0568   Epoch: 4   Global Step: 61200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:29:08,481-Speed 2948.43 samples/sec   Loss 11.0456   LearningRate 0.0568   Epoch: 4   Global Step: 61210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:29:11,890-Speed 3004.85 samples/sec   Loss 11.1161   LearningRate 0.0568   Epoch: 4   Global Step: 61220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:29:15,359-Speed 2952.89 samples/sec   Loss 11.1033   LearningRate 0.0568   Epoch: 4   Global Step: 61230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:29:18,736-Speed 3033.13 samples/sec   Loss 11.0022   LearningRate 0.0568   Epoch: 4   Global Step: 61240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:29:22,170-Speed 2982.85 samples/sec   Loss 10.9823   LearningRate 0.0568   Epoch: 4   Global Step: 61250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:29:25,625-Speed 2965.38 samples/sec   Loss 11.1470   LearningRate 0.0568   Epoch: 4   Global Step: 61260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:29:29,037-Speed 3002.04 samples/sec   Loss 10.9460   LearningRate 0.0568   Epoch: 4   Global Step: 61270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:29:32,490-Speed 2965.97 samples/sec   Loss 11.0930   LearningRate 0.0567   Epoch: 4   Global Step: 61280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:29:35,889-Speed 3014.11 samples/sec   Loss 11.1668   LearningRate 0.0567   Epoch: 4   Global Step: 61290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:29:39,250-Speed 3047.77 samples/sec   Loss 11.1518   LearningRate 0.0567   Epoch: 4   Global Step: 61300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:29:42,634-Speed 3026.47 samples/sec   Loss 11.2314   LearningRate 0.0567   Epoch: 4   Global Step: 61310   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:29:46,029-Speed 3017.77 samples/sec   Loss 11.0954   LearningRate 0.0567   Epoch: 4   Global Step: 61320   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:29:49,418-Speed 3022.34 samples/sec   Loss 11.1765   LearningRate 0.0567   Epoch: 4   Global Step: 61330   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:29:52,825-Speed 3005.78 samples/sec   Loss 11.2132   LearningRate 0.0567   Epoch: 4   Global Step: 61340   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:29:56,226-Speed 3012.12 samples/sec   Loss 11.0995   LearningRate 0.0567   Epoch: 4   Global Step: 61350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:29:59,618-Speed 3019.89 samples/sec   Loss 11.1593   LearningRate 0.0567   Epoch: 4   Global Step: 61360   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:30:03,068-Speed 2969.20 samples/sec   Loss 11.0817   LearningRate 0.0567   Epoch: 4   Global Step: 61370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:30:06,416-Speed 3058.96 samples/sec   Loss 11.1802   LearningRate 0.0567   Epoch: 4   Global Step: 61380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:30:09,799-Speed 3027.97 samples/sec   Loss 11.0379   LearningRate 0.0567   Epoch: 4   Global Step: 61390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:30:13,251-Speed 2967.14 samples/sec   Loss 11.0088   LearningRate 0.0567   Epoch: 4   Global Step: 61400   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:30:16,755-Speed 2923.19 samples/sec   Loss 11.0874   LearningRate 0.0567   Epoch: 4   Global Step: 61410   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:30:20,091-Speed 3070.83 samples/sec   Loss 11.1335   LearningRate 0.0567   Epoch: 4   Global Step: 61420   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:30:23,466-Speed 3034.67 samples/sec   Loss 11.0737   LearningRate 0.0567   Epoch: 4   Global Step: 61430   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:30:26,869-Speed 3010.00 samples/sec   Loss 10.9982   LearningRate 0.0567   Epoch: 4   Global Step: 61440   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:30:30,242-Speed 3036.74 samples/sec   Loss 10.9013   LearningRate 0.0566   Epoch: 4   Global Step: 61450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:30:33,677-Speed 2982.20 samples/sec   Loss 11.0042   LearningRate 0.0566   Epoch: 4   Global Step: 61460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:30:37,024-Speed 3060.60 samples/sec   Loss 11.0231   LearningRate 0.0566   Epoch: 4   Global Step: 61470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:30:40,495-Speed 2950.84 samples/sec   Loss 11.1729   LearningRate 0.0566   Epoch: 4   Global Step: 61480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:30:43,867-Speed 3037.68 samples/sec   Loss 11.1574   LearningRate 0.0566   Epoch: 4   Global Step: 61490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:30:47,202-Speed 3071.55 samples/sec   Loss 11.0872   LearningRate 0.0566   Epoch: 4   Global Step: 61500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:30:50,659-Speed 2962.86 samples/sec   Loss 11.1720   LearningRate 0.0566   Epoch: 4   Global Step: 61510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:30:54,102-Speed 2975.02 samples/sec   Loss 11.1206   LearningRate 0.0566   Epoch: 4   Global Step: 61520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:30:57,481-Speed 3031.44 samples/sec   Loss 10.9862   LearningRate 0.0566   Epoch: 4   Global Step: 61530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:31:00,853-Speed 3037.77 samples/sec   Loss 10.9804   LearningRate 0.0566   Epoch: 4   Global Step: 61540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:31:04,233-Speed 3030.11 samples/sec   Loss 11.1287   LearningRate 0.0566   Epoch: 4   Global Step: 61550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:31:07,646-Speed 3001.89 samples/sec   Loss 10.9787   LearningRate 0.0566   Epoch: 4   Global Step: 61560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:31:11,033-Speed 3023.76 samples/sec   Loss 11.0283   LearningRate 0.0566   Epoch: 4   Global Step: 61570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:31:14,541-Speed 2919.96 samples/sec   Loss 11.1294   LearningRate 0.0566   Epoch: 4   Global Step: 61580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:31:17,980-Speed 2978.81 samples/sec   Loss 11.0197   LearningRate 0.0566   Epoch: 4   Global Step: 61590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:31:21,420-Speed 2977.53 samples/sec   Loss 11.2090   LearningRate 0.0566   Epoch: 4   Global Step: 61600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:31:24,786-Speed 3042.21 samples/sec   Loss 11.1417   LearningRate 0.0565   Epoch: 4   Global Step: 61610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:31:28,282-Speed 2929.88 samples/sec   Loss 11.0679   LearningRate 0.0565   Epoch: 4   Global Step: 61620   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:31:31,657-Speed 3035.34 samples/sec   Loss 10.8976   LearningRate 0.0565   Epoch: 4   Global Step: 61630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:31:35,033-Speed 3033.74 samples/sec   Loss 11.0406   LearningRate 0.0565   Epoch: 4   Global Step: 61640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:31:38,432-Speed 3013.42 samples/sec   Loss 11.1043   LearningRate 0.0565   Epoch: 4   Global Step: 61650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:31:41,813-Speed 3030.16 samples/sec   Loss 11.0649   LearningRate 0.0565   Epoch: 4   Global Step: 61660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:31:45,148-Speed 3071.34 samples/sec   Loss 10.9366   LearningRate 0.0565   Epoch: 4   Global Step: 61670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:31:48,572-Speed 2991.08 samples/sec   Loss 11.0169   LearningRate 0.0565   Epoch: 4   Global Step: 61680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:31:51,923-Speed 3056.85 samples/sec   Loss 11.0757   LearningRate 0.0565   Epoch: 4   Global Step: 61690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:31:55,347-Speed 2991.55 samples/sec   Loss 11.0033   LearningRate 0.0565   Epoch: 4   Global Step: 61700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:31:58,756-Speed 3004.97 samples/sec   Loss 11.1916   LearningRate 0.0565   Epoch: 4   Global Step: 61710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:32:02,131-Speed 3034.86 samples/sec   Loss 11.3488   LearningRate 0.0565   Epoch: 4   Global Step: 61720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:32:05,446-Speed 3089.34 samples/sec   Loss 11.0641   LearningRate 0.0565   Epoch: 4   Global Step: 61730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:32:08,887-Speed 2977.53 samples/sec   Loss 11.0887   LearningRate 0.0565   Epoch: 4   Global Step: 61740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:32:12,289-Speed 3010.17 samples/sec   Loss 11.0607   LearningRate 0.0565   Epoch: 4   Global Step: 61750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:32:15,746-Speed 2963.12 samples/sec   Loss 11.1026   LearningRate 0.0565   Epoch: 4   Global Step: 61760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:32:19,111-Speed 3044.33 samples/sec   Loss 11.0546   LearningRate 0.0565   Epoch: 4   Global Step: 61770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:32:22,534-Speed 2992.65 samples/sec   Loss 11.0142   LearningRate 0.0564   Epoch: 4   Global Step: 61780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:32:25,968-Speed 2982.47 samples/sec   Loss 11.0530   LearningRate 0.0564   Epoch: 4   Global Step: 61790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:32:29,381-Speed 3001.04 samples/sec   Loss 11.1621   LearningRate 0.0564   Epoch: 4   Global Step: 61800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:32:32,879-Speed 2928.48 samples/sec   Loss 11.0021   LearningRate 0.0564   Epoch: 4   Global Step: 61810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:32:36,273-Speed 3017.06 samples/sec   Loss 11.0466   LearningRate 0.0564   Epoch: 4   Global Step: 61820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:32:39,647-Speed 3036.80 samples/sec   Loss 11.1266   LearningRate 0.0564   Epoch: 4   Global Step: 61830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:32:43,064-Speed 2997.13 samples/sec   Loss 11.1204   LearningRate 0.0564   Epoch: 4   Global Step: 61840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:32:46,447-Speed 3027.60 samples/sec   Loss 11.1243   LearningRate 0.0564   Epoch: 4   Global Step: 61850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:32:49,881-Speed 2983.39 samples/sec   Loss 11.0452   LearningRate 0.0564   Epoch: 4   Global Step: 61860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:32:53,305-Speed 2991.72 samples/sec   Loss 10.9924   LearningRate 0.0564   Epoch: 4   Global Step: 61870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:32:56,842-Speed 2896.96 samples/sec   Loss 11.2675   LearningRate 0.0564   Epoch: 4   Global Step: 61880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:33:00,184-Speed 3065.17 samples/sec   Loss 10.9703   LearningRate 0.0564   Epoch: 4   Global Step: 61890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:33:03,577-Speed 3018.36 samples/sec   Loss 10.9994   LearningRate 0.0564   Epoch: 4   Global Step: 61900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:33:06,992-Speed 3000.03 samples/sec   Loss 10.9538   LearningRate 0.0564   Epoch: 4   Global Step: 61910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:33:10,360-Speed 3040.95 samples/sec   Loss 11.1731   LearningRate 0.0564   Epoch: 4   Global Step: 61920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:33:13,858-Speed 2929.93 samples/sec   Loss 11.0406   LearningRate 0.0564   Epoch: 4   Global Step: 61930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:33:17,360-Speed 2924.21 samples/sec   Loss 11.0319   LearningRate 0.0563   Epoch: 4   Global Step: 61940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:33:20,738-Speed 3032.30 samples/sec   Loss 11.0258   LearningRate 0.0563   Epoch: 4   Global Step: 61950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:33:24,104-Speed 3042.94 samples/sec   Loss 11.1742   LearningRate 0.0563   Epoch: 4   Global Step: 61960   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:33:27,503-Speed 3014.09 samples/sec   Loss 11.0447   LearningRate 0.0563   Epoch: 4   Global Step: 61970   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:33:30,922-Speed 2995.65 samples/sec   Loss 11.0774   LearningRate 0.0563   Epoch: 4   Global Step: 61980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:33:34,273-Speed 3056.98 samples/sec   Loss 11.0531   LearningRate 0.0563   Epoch: 4   Global Step: 61990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:33:37,665-Speed 3019.51 samples/sec   Loss 11.0466   LearningRate 0.0563   Epoch: 4   Global Step: 62000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:33:41,083-Speed 2997.11 samples/sec   Loss 10.9762   LearningRate 0.0563   Epoch: 4   Global Step: 62010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:33:44,538-Speed 2964.62 samples/sec   Loss 11.0364   LearningRate 0.0563   Epoch: 4   Global Step: 62020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:33:47,919-Speed 3029.45 samples/sec   Loss 11.0337   LearningRate 0.0563   Epoch: 4   Global Step: 62030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:33:51,355-Speed 2980.89 samples/sec   Loss 10.9832   LearningRate 0.0563   Epoch: 4   Global Step: 62040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:33:54,762-Speed 3006.58 samples/sec   Loss 11.1033   LearningRate 0.0563   Epoch: 4   Global Step: 62050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:33:58,208-Speed 2972.60 samples/sec   Loss 10.9834   LearningRate 0.0563   Epoch: 4   Global Step: 62060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:34:01,662-Speed 2966.19 samples/sec   Loss 11.0649   LearningRate 0.0563   Epoch: 4   Global Step: 62070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:34:05,064-Speed 3010.35 samples/sec   Loss 11.0305   LearningRate 0.0563   Epoch: 4   Global Step: 62080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:34:08,478-Speed 3000.21 samples/sec   Loss 11.0374   LearningRate 0.0563   Epoch: 4   Global Step: 62090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:34:12,113-Speed 2818.60 samples/sec   Loss 11.0921   LearningRate 0.0563   Epoch: 4   Global Step: 62100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:34:44,975-Speed 311.61 samples/sec   Loss 10.3215   LearningRate 0.0562   Epoch: 5   Global Step: 62110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:34:48,768-Speed 2701.18 samples/sec   Loss 9.5626   LearningRate 0.0562   Epoch: 5   Global Step: 62120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:34:52,112-Speed 3062.74 samples/sec   Loss 9.5958   LearningRate 0.0562   Epoch: 5   Global Step: 62130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:34:55,573-Speed 2959.98 samples/sec   Loss 9.5816   LearningRate 0.0562   Epoch: 5   Global Step: 62140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:34:58,988-Speed 2999.98 samples/sec   Loss 9.4290   LearningRate 0.0562   Epoch: 5   Global Step: 62150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:35:02,423-Speed 2981.90 samples/sec   Loss 9.6085   LearningRate 0.0562   Epoch: 5   Global Step: 62160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:35:05,786-Speed 3045.78 samples/sec   Loss 9.5169   LearningRate 0.0562   Epoch: 5   Global Step: 62170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:35:09,194-Speed 3005.15 samples/sec   Loss 9.6375   LearningRate 0.0562   Epoch: 5   Global Step: 62180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:35:12,533-Speed 3068.40 samples/sec   Loss 9.6095   LearningRate 0.0562   Epoch: 5   Global Step: 62190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:35:15,905-Speed 3036.82 samples/sec   Loss 9.4291   LearningRate 0.0562   Epoch: 5   Global Step: 62200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:35:19,263-Speed 3050.83 samples/sec   Loss 9.6182   LearningRate 0.0562   Epoch: 5   Global Step: 62210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:35:22,570-Speed 3096.97 samples/sec   Loss 9.4552   LearningRate 0.0562   Epoch: 5   Global Step: 62220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:35:25,994-Speed 2991.18 samples/sec   Loss 9.4728   LearningRate 0.0562   Epoch: 5   Global Step: 62230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:35:29,446-Speed 2968.06 samples/sec   Loss 9.6476   LearningRate 0.0562   Epoch: 5   Global Step: 62240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:35:32,791-Speed 3061.93 samples/sec   Loss 9.7005   LearningRate 0.0562   Epoch: 5   Global Step: 62250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:35:36,151-Speed 3049.13 samples/sec   Loss 9.7295   LearningRate 0.0562   Epoch: 5   Global Step: 62260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:35:39,554-Speed 3009.71 samples/sec   Loss 9.7019   LearningRate 0.0562   Epoch: 5   Global Step: 62270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:35:42,972-Speed 2996.32 samples/sec   Loss 9.5800   LearningRate 0.0561   Epoch: 5   Global Step: 62280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:35:46,472-Speed 2926.86 samples/sec   Loss 9.6913   LearningRate 0.0561   Epoch: 5   Global Step: 62290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:35:49,887-Speed 2999.49 samples/sec   Loss 9.6377   LearningRate 0.0561   Epoch: 5   Global Step: 62300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:35:53,396-Speed 2918.96 samples/sec   Loss 9.6242   LearningRate 0.0561   Epoch: 5   Global Step: 62310   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:35:56,726-Speed 3075.58 samples/sec   Loss 9.5635   LearningRate 0.0561   Epoch: 5   Global Step: 62320   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:36:00,135-Speed 3005.32 samples/sec   Loss 9.6383   LearningRate 0.0561   Epoch: 5   Global Step: 62330   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:36:03,485-Speed 3057.54 samples/sec   Loss 9.8049   LearningRate 0.0561   Epoch: 5   Global Step: 62340   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:36:06,871-Speed 3024.89 samples/sec   Loss 9.6228   LearningRate 0.0561   Epoch: 5   Global Step: 62350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:36:10,238-Speed 3041.73 samples/sec   Loss 9.9038   LearningRate 0.0561   Epoch: 5   Global Step: 62360   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:36:13,572-Speed 3072.37 samples/sec   Loss 9.7633   LearningRate 0.0561   Epoch: 5   Global Step: 62370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:36:16,951-Speed 3031.99 samples/sec   Loss 9.8192   LearningRate 0.0561   Epoch: 5   Global Step: 62380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:36:20,354-Speed 3009.13 samples/sec   Loss 9.7288   LearningRate 0.0561   Epoch: 5   Global Step: 62390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:36:23,768-Speed 3000.21 samples/sec   Loss 9.8726   LearningRate 0.0561   Epoch: 5   Global Step: 62400   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:36:27,319-Speed 2884.85 samples/sec   Loss 9.8785   LearningRate 0.0561   Epoch: 5   Global Step: 62410   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:36:30,705-Speed 3025.17 samples/sec   Loss 9.7871   LearningRate 0.0561   Epoch: 5   Global Step: 62420   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:36:34,312-Speed 2839.23 samples/sec   Loss 9.8195   LearningRate 0.0561   Epoch: 5   Global Step: 62430   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:36:38,013-Speed 2767.52 samples/sec   Loss 9.7767   LearningRate 0.0560   Epoch: 5   Global Step: 62440   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:36:41,456-Speed 2975.30 samples/sec   Loss 9.8623   LearningRate 0.0560   Epoch: 5   Global Step: 62450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:36:44,884-Speed 2988.63 samples/sec   Loss 9.7631   LearningRate 0.0560   Epoch: 5   Global Step: 62460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:36:48,349-Speed 2955.72 samples/sec   Loss 9.7628   LearningRate 0.0560   Epoch: 5   Global Step: 62470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:36:51,846-Speed 2928.78 samples/sec   Loss 9.8545   LearningRate 0.0560   Epoch: 5   Global Step: 62480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:36:55,262-Speed 2998.28 samples/sec   Loss 9.8812   LearningRate 0.0560   Epoch: 5   Global Step: 62490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:36:58,635-Speed 3036.89 samples/sec   Loss 9.6874   LearningRate 0.0560   Epoch: 5   Global Step: 62500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:02,029-Speed 3018.26 samples/sec   Loss 9.6779   LearningRate 0.0560   Epoch: 5   Global Step: 62510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:05,398-Speed 3040.71 samples/sec   Loss 9.7529   LearningRate 0.0560   Epoch: 5   Global Step: 62520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:08,776-Speed 3032.78 samples/sec   Loss 9.9488   LearningRate 0.0560   Epoch: 5   Global Step: 62530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:12,171-Speed 3017.21 samples/sec   Loss 9.9177   LearningRate 0.0560   Epoch: 5   Global Step: 62540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:15,629-Speed 2961.71 samples/sec   Loss 9.9251   LearningRate 0.0560   Epoch: 5   Global Step: 62550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:19,067-Speed 2979.45 samples/sec   Loss 9.9615   LearningRate 0.0560   Epoch: 5   Global Step: 62560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:22,466-Speed 3014.10 samples/sec   Loss 9.8064   LearningRate 0.0560   Epoch: 5   Global Step: 62570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:25,859-Speed 3018.21 samples/sec   Loss 9.7462   LearningRate 0.0560   Epoch: 5   Global Step: 62580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:29,243-Speed 3027.48 samples/sec   Loss 9.8390   LearningRate 0.0560   Epoch: 5   Global Step: 62590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:37:32,591-Speed 3059.40 samples/sec   Loss 9.7444   LearningRate 0.0560   Epoch: 5   Global Step: 62600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:35,939-Speed 3059.21 samples/sec   Loss 9.8240   LearningRate 0.0559   Epoch: 5   Global Step: 62610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:39,343-Speed 3009.17 samples/sec   Loss 9.9918   LearningRate 0.0559   Epoch: 5   Global Step: 62620   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:42,751-Speed 3006.19 samples/sec   Loss 10.0691   LearningRate 0.0559   Epoch: 5   Global Step: 62630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:46,132-Speed 3029.53 samples/sec   Loss 10.0102   LearningRate 0.0559   Epoch: 5   Global Step: 62640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:49,510-Speed 3032.54 samples/sec   Loss 9.8886   LearningRate 0.0559   Epoch: 5   Global Step: 62650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:52,915-Speed 3007.86 samples/sec   Loss 9.9772   LearningRate 0.0559   Epoch: 5   Global Step: 62660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:56,405-Speed 2935.90 samples/sec   Loss 9.9107   LearningRate 0.0559   Epoch: 5   Global Step: 62670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:37:59,796-Speed 3020.32 samples/sec   Loss 9.9604   LearningRate 0.0559   Epoch: 5   Global Step: 62680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:38:03,184-Speed 3023.68 samples/sec   Loss 10.0702   LearningRate 0.0559   Epoch: 5   Global Step: 62690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:38:06,576-Speed 3019.27 samples/sec   Loss 9.8848   LearningRate 0.0559   Epoch: 5   Global Step: 62700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:38:09,967-Speed 3020.38 samples/sec   Loss 10.0451   LearningRate 0.0559   Epoch: 5   Global Step: 62710   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:38:13,361-Speed 3018.04 samples/sec   Loss 10.1155   LearningRate 0.0559   Epoch: 5   Global Step: 62720   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:38:16,795-Speed 2983.06 samples/sec   Loss 9.9659   LearningRate 0.0559   Epoch: 5   Global Step: 62730   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:38:20,213-Speed 2996.37 samples/sec   Loss 9.8999   LearningRate 0.0559   Epoch: 5   Global Step: 62740   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:38:23,638-Speed 2990.87 samples/sec   Loss 10.1011   LearningRate 0.0559   Epoch: 5   Global Step: 62750   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:38:27,027-Speed 3022.53 samples/sec   Loss 9.9075   LearningRate 0.0559   Epoch: 5   Global Step: 62760   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:38:30,489-Speed 2958.49 samples/sec   Loss 10.0791   LearningRate 0.0558   Epoch: 5   Global Step: 62770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:38:33,923-Speed 2983.61 samples/sec   Loss 9.9984   LearningRate 0.0558   Epoch: 5   Global Step: 62780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:38:37,329-Speed 3007.50 samples/sec   Loss 9.8182   LearningRate 0.0558   Epoch: 5   Global Step: 62790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:38:40,758-Speed 2986.72 samples/sec   Loss 9.9068   LearningRate 0.0558   Epoch: 5   Global Step: 62800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:38:44,098-Speed 3067.43 samples/sec   Loss 10.0933   LearningRate 0.0558   Epoch: 5   Global Step: 62810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:38:47,399-Speed 3102.48 samples/sec   Loss 9.9042   LearningRate 0.0558   Epoch: 5   Global Step: 62820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:38:50,776-Speed 3033.27 samples/sec   Loss 10.0639   LearningRate 0.0558   Epoch: 5   Global Step: 62830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:38:54,234-Speed 2962.30 samples/sec   Loss 10.1064   LearningRate 0.0558   Epoch: 5   Global Step: 62840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:38:57,617-Speed 3028.14 samples/sec   Loss 10.0424   LearningRate 0.0558   Epoch: 5   Global Step: 62850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:39:01,045-Speed 2987.52 samples/sec   Loss 10.0327   LearningRate 0.0558   Epoch: 5   Global Step: 62860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:39:04,417-Speed 3037.93 samples/sec   Loss 9.9690   LearningRate 0.0558   Epoch: 5   Global Step: 62870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:39:07,843-Speed 2989.63 samples/sec   Loss 10.1914   LearningRate 0.0558   Epoch: 5   Global Step: 62880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:39:11,252-Speed 3004.48 samples/sec   Loss 9.9738   LearningRate 0.0558   Epoch: 5   Global Step: 62890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:39:14,644-Speed 3020.57 samples/sec   Loss 10.0844   LearningRate 0.0558   Epoch: 5   Global Step: 62900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:39:18,096-Speed 2967.29 samples/sec   Loss 10.0437   LearningRate 0.0558   Epoch: 5   Global Step: 62910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:39:21,457-Speed 3047.04 samples/sec   Loss 10.0016   LearningRate 0.0558   Epoch: 5   Global Step: 62920   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:39:24,796-Speed 3067.41 samples/sec   Loss 10.0604   LearningRate 0.0558   Epoch: 5   Global Step: 62930   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:39:28,164-Speed 3041.67 samples/sec   Loss 10.1423   LearningRate 0.0557   Epoch: 5   Global Step: 62940   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:39:31,527-Speed 3045.70 samples/sec   Loss 10.2449   LearningRate 0.0557   Epoch: 5   Global Step: 62950   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:39:34,847-Speed 3085.49 samples/sec   Loss 10.2750   LearningRate 0.0557   Epoch: 5   Global Step: 62960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:39:38,249-Speed 3010.88 samples/sec   Loss 10.1862   LearningRate 0.0557   Epoch: 5   Global Step: 62970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:39:41,632-Speed 3027.36 samples/sec   Loss 10.1229   LearningRate 0.0557   Epoch: 5   Global Step: 62980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:39:45,070-Speed 2979.58 samples/sec   Loss 10.1011   LearningRate 0.0557   Epoch: 5   Global Step: 62990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:39:48,471-Speed 3011.28 samples/sec   Loss 10.0860   LearningRate 0.0557   Epoch: 5   Global Step: 63000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:39:51,844-Speed 3036.64 samples/sec   Loss 10.0921   LearningRate 0.0557   Epoch: 5   Global Step: 63010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:39:55,256-Speed 3001.79 samples/sec   Loss 10.1756   LearningRate 0.0557   Epoch: 5   Global Step: 63020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:39:58,603-Speed 3060.30 samples/sec   Loss 10.0426   LearningRate 0.0557   Epoch: 5   Global Step: 63030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:02,026-Speed 2992.50 samples/sec   Loss 10.3137   LearningRate 0.0557   Epoch: 5   Global Step: 63040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:05,406-Speed 3030.77 samples/sec   Loss 10.2669   LearningRate 0.0557   Epoch: 5   Global Step: 63050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:08,784-Speed 3031.75 samples/sec   Loss 10.2827   LearningRate 0.0557   Epoch: 5   Global Step: 63060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:40:12,140-Speed 3052.68 samples/sec   Loss 10.0795   LearningRate 0.0557   Epoch: 5   Global Step: 63070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:15,486-Speed 3060.99 samples/sec   Loss 10.0346   LearningRate 0.0557   Epoch: 5   Global Step: 63080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:18,941-Speed 2965.23 samples/sec   Loss 10.1577   LearningRate 0.0557   Epoch: 5   Global Step: 63090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:22,268-Speed 3079.08 samples/sec   Loss 10.1251   LearningRate 0.0557   Epoch: 5   Global Step: 63100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:25,626-Speed 3049.63 samples/sec   Loss 10.2392   LearningRate 0.0556   Epoch: 5   Global Step: 63110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:29,034-Speed 3005.42 samples/sec   Loss 10.3909   LearningRate 0.0556   Epoch: 5   Global Step: 63120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:32,395-Speed 3047.97 samples/sec   Loss 10.2874   LearningRate 0.0556   Epoch: 5   Global Step: 63130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:35,827-Speed 2984.13 samples/sec   Loss 10.1617   LearningRate 0.0556   Epoch: 5   Global Step: 63140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:39,242-Speed 2999.92 samples/sec   Loss 10.2156   LearningRate 0.0556   Epoch: 5   Global Step: 63150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:42,656-Speed 3000.83 samples/sec   Loss 10.1042   LearningRate 0.0556   Epoch: 5   Global Step: 63160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:46,062-Speed 3007.17 samples/sec   Loss 10.0339   LearningRate 0.0556   Epoch: 5   Global Step: 63170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:40:49,566-Speed 2923.64 samples/sec   Loss 10.1711   LearningRate 0.0556   Epoch: 5   Global Step: 63180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:40:52,927-Speed 3047.24 samples/sec   Loss 10.0814   LearningRate 0.0556   Epoch: 5   Global Step: 63190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:56,406-Speed 2943.77 samples/sec   Loss 10.2197   LearningRate 0.0556   Epoch: 5   Global Step: 63200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:40:59,803-Speed 3015.48 samples/sec   Loss 10.1238   LearningRate 0.0556   Epoch: 5   Global Step: 63210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:41:03,213-Speed 3003.59 samples/sec   Loss 10.3654   LearningRate 0.0556   Epoch: 5   Global Step: 63220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:41:06,651-Speed 2980.13 samples/sec   Loss 10.3771   LearningRate 0.0556   Epoch: 5   Global Step: 63230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:41:10,042-Speed 3020.12 samples/sec   Loss 10.2834   LearningRate 0.0556   Epoch: 5   Global Step: 63240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:41:13,373-Speed 3075.79 samples/sec   Loss 10.1849   LearningRate 0.0556   Epoch: 5   Global Step: 63250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:41:16,764-Speed 3020.49 samples/sec   Loss 10.1723   LearningRate 0.0556   Epoch: 5   Global Step: 63260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:41:20,114-Speed 3058.07 samples/sec   Loss 10.4094   LearningRate 0.0555   Epoch: 5   Global Step: 63270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:41:23,531-Speed 2997.64 samples/sec   Loss 10.2561   LearningRate 0.0555   Epoch: 5   Global Step: 63280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:41:26,879-Speed 3059.02 samples/sec   Loss 10.2356   LearningRate 0.0555   Epoch: 5   Global Step: 63290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:41:30,223-Speed 3063.02 samples/sec   Loss 10.1762   LearningRate 0.0555   Epoch: 5   Global Step: 63300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:41:33,567-Speed 3063.55 samples/sec   Loss 10.1985   LearningRate 0.0555   Epoch: 5   Global Step: 63310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:41:36,990-Speed 2991.92 samples/sec   Loss 10.1745   LearningRate 0.0555   Epoch: 5   Global Step: 63320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:41:40,346-Speed 3052.34 samples/sec   Loss 10.3111   LearningRate 0.0555   Epoch: 5   Global Step: 63330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:41:43,719-Speed 3036.50 samples/sec   Loss 10.1673   LearningRate 0.0555   Epoch: 5   Global Step: 63340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:41:47,056-Speed 3069.53 samples/sec   Loss 10.1893   LearningRate 0.0555   Epoch: 5   Global Step: 63350   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:41:50,482-Speed 2989.77 samples/sec   Loss 10.2655   LearningRate 0.0555   Epoch: 5   Global Step: 63360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:41:53,942-Speed 2960.18 samples/sec   Loss 10.2569   LearningRate 0.0555   Epoch: 5   Global Step: 63370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:41:57,346-Speed 3009.65 samples/sec   Loss 10.4852   LearningRate 0.0555   Epoch: 5   Global Step: 63380   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:42:00,719-Speed 3035.89 samples/sec   Loss 10.5209   LearningRate 0.0555   Epoch: 5   Global Step: 63390   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:42:04,148-Speed 2987.35 samples/sec   Loss 10.4006   LearningRate 0.0555   Epoch: 5   Global Step: 63400   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:42:07,554-Speed 3007.58 samples/sec   Loss 10.3950   LearningRate 0.0555   Epoch: 5   Global Step: 63410   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:42:10,960-Speed 3007.07 samples/sec   Loss 10.4323   LearningRate 0.0555   Epoch: 5   Global Step: 63420   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:42:14,332-Speed 3037.76 samples/sec   Loss 10.2640   LearningRate 0.0555   Epoch: 5   Global Step: 63430   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:42:17,665-Speed 3073.94 samples/sec   Loss 10.2056   LearningRate 0.0554   Epoch: 5   Global Step: 63440   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:42:21,127-Speed 2957.97 samples/sec   Loss 10.4892   LearningRate 0.0554   Epoch: 5   Global Step: 63450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:42:24,511-Speed 3027.60 samples/sec   Loss 10.3815   LearningRate 0.0554   Epoch: 5   Global Step: 63460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:42:27,861-Speed 3056.72 samples/sec   Loss 10.4426   LearningRate 0.0554   Epoch: 5   Global Step: 63470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:42:31,192-Speed 3075.20 samples/sec   Loss 10.2695   LearningRate 0.0554   Epoch: 5   Global Step: 63480   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:42:34,631-Speed 2978.77 samples/sec   Loss 10.2664   LearningRate 0.0554   Epoch: 5   Global Step: 63490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:42:37,993-Speed 3046.46 samples/sec   Loss 10.3140   LearningRate 0.0554   Epoch: 5   Global Step: 63500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:42:41,423-Speed 2986.43 samples/sec   Loss 10.4192   LearningRate 0.0554   Epoch: 5   Global Step: 63510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:42:44,791-Speed 3041.14 samples/sec   Loss 10.2234   LearningRate 0.0554   Epoch: 5   Global Step: 63520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:42:48,165-Speed 3036.31 samples/sec   Loss 10.2599   LearningRate 0.0554   Epoch: 5   Global Step: 63530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:42:51,546-Speed 3029.40 samples/sec   Loss 10.2370   LearningRate 0.0554   Epoch: 5   Global Step: 63540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:42:54,858-Speed 3092.27 samples/sec   Loss 10.4867   LearningRate 0.0554   Epoch: 5   Global Step: 63550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:42:58,227-Speed 3040.54 samples/sec   Loss 10.3328   LearningRate 0.0554   Epoch: 5   Global Step: 63560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:43:01,648-Speed 2994.19 samples/sec   Loss 10.3742   LearningRate 0.0554   Epoch: 5   Global Step: 63570   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:43:05,075-Speed 2989.26 samples/sec   Loss 10.4058   LearningRate 0.0554   Epoch: 5   Global Step: 63580   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:43:08,437-Speed 3046.96 samples/sec   Loss 10.3098   LearningRate 0.0554   Epoch: 5   Global Step: 63590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:43:11,773-Speed 3070.14 samples/sec   Loss 10.4993   LearningRate 0.0554   Epoch: 5   Global Step: 63600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:43:15,120-Speed 3060.56 samples/sec   Loss 10.4242   LearningRate 0.0553   Epoch: 5   Global Step: 63610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:43:18,484-Speed 3045.30 samples/sec   Loss 10.5310   LearningRate 0.0553   Epoch: 5   Global Step: 63620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:43:21,917-Speed 2983.49 samples/sec   Loss 10.3320   LearningRate 0.0553   Epoch: 5   Global Step: 63630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:43:25,280-Speed 3045.68 samples/sec   Loss 10.4068   LearningRate 0.0553   Epoch: 5   Global Step: 63640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:43:28,588-Speed 3096.97 samples/sec   Loss 10.4640   LearningRate 0.0553   Epoch: 5   Global Step: 63650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:43:31,963-Speed 3034.76 samples/sec   Loss 10.5032   LearningRate 0.0553   Epoch: 5   Global Step: 63660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:43:35,382-Speed 2995.40 samples/sec   Loss 10.2540   LearningRate 0.0553   Epoch: 5   Global Step: 63670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:43:38,755-Speed 3037.53 samples/sec   Loss 10.3982   LearningRate 0.0553   Epoch: 5   Global Step: 63680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:43:42,151-Speed 3015.21 samples/sec   Loss 10.4053   LearningRate 0.0553   Epoch: 5   Global Step: 63690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:43:45,463-Speed 3093.32 samples/sec   Loss 10.4485   LearningRate 0.0553   Epoch: 5   Global Step: 63700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:43:48,788-Speed 3080.34 samples/sec   Loss 10.4929   LearningRate 0.0553   Epoch: 5   Global Step: 63710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:43:52,188-Speed 3013.12 samples/sec   Loss 10.5086   LearningRate 0.0553   Epoch: 5   Global Step: 63720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:43:55,579-Speed 3020.74 samples/sec   Loss 10.4957   LearningRate 0.0553   Epoch: 5   Global Step: 63730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:43:58,933-Speed 3053.89 samples/sec   Loss 10.5494   LearningRate 0.0553   Epoch: 5   Global Step: 63740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:44:02,356-Speed 2992.08 samples/sec   Loss 10.5097   LearningRate 0.0553   Epoch: 5   Global Step: 63750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:44:05,754-Speed 3014.79 samples/sec   Loss 10.2893   LearningRate 0.0553   Epoch: 5   Global Step: 63760   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:44:09,202-Speed 2970.93 samples/sec   Loss 10.4935   LearningRate 0.0552   Epoch: 5   Global Step: 63770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:44:12,603-Speed 3011.33 samples/sec   Loss 10.4717   LearningRate 0.0552   Epoch: 5   Global Step: 63780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:44:16,093-Speed 2935.10 samples/sec   Loss 10.4197   LearningRate 0.0552   Epoch: 5   Global Step: 63790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:44:19,537-Speed 2974.01 samples/sec   Loss 10.4784   LearningRate 0.0552   Epoch: 5   Global Step: 63800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:44:23,025-Speed 2937.07 samples/sec   Loss 10.3926   LearningRate 0.0552   Epoch: 5   Global Step: 63810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:44:26,426-Speed 3011.00 samples/sec   Loss 10.4959   LearningRate 0.0552   Epoch: 5   Global Step: 63820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:44:29,884-Speed 2962.49 samples/sec   Loss 10.5062   LearningRate 0.0552   Epoch: 5   Global Step: 63830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:44:33,214-Speed 3075.62 samples/sec   Loss 10.5311   LearningRate 0.0552   Epoch: 5   Global Step: 63840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:44:36,624-Speed 3004.35 samples/sec   Loss 10.4073   LearningRate 0.0552   Epoch: 5   Global Step: 63850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:44:40,086-Speed 2958.59 samples/sec   Loss 10.6511   LearningRate 0.0552   Epoch: 5   Global Step: 63860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:44:43,446-Speed 3049.02 samples/sec   Loss 10.6344   LearningRate 0.0552   Epoch: 5   Global Step: 63870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:44:46,840-Speed 3017.28 samples/sec   Loss 10.5207   LearningRate 0.0552   Epoch: 5   Global Step: 63880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:44:50,253-Speed 3001.58 samples/sec   Loss 10.5527   LearningRate 0.0552   Epoch: 5   Global Step: 63890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:44:53,698-Speed 2973.08 samples/sec   Loss 10.6098   LearningRate 0.0552   Epoch: 5   Global Step: 63900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:44:57,069-Speed 3038.41 samples/sec   Loss 10.5222   LearningRate 0.0552   Epoch: 5   Global Step: 63910   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:45:00,464-Speed 3017.55 samples/sec   Loss 10.5640   LearningRate 0.0552   Epoch: 5   Global Step: 63920   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:45:03,896-Speed 2983.95 samples/sec   Loss 10.5806   LearningRate 0.0552   Epoch: 5   Global Step: 63930   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:45:07,267-Speed 3038.86 samples/sec   Loss 10.4607   LearningRate 0.0551   Epoch: 5   Global Step: 63940   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:45:10,674-Speed 3006.66 samples/sec   Loss 10.4725   LearningRate 0.0551   Epoch: 5   Global Step: 63950   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:45:14,071-Speed 3014.91 samples/sec   Loss 10.4276   LearningRate 0.0551   Epoch: 5   Global Step: 63960   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:45:17,417-Speed 3060.97 samples/sec   Loss 10.4739   LearningRate 0.0551   Epoch: 5   Global Step: 63970   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:45:20,745-Speed 3078.87 samples/sec   Loss 10.4910   LearningRate 0.0551   Epoch: 5   Global Step: 63980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:45:24,062-Speed 3087.32 samples/sec   Loss 10.5808   LearningRate 0.0551   Epoch: 5   Global Step: 63990   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:45:27,413-Speed 3057.32 samples/sec   Loss 10.5820   LearningRate 0.0551   Epoch: 5   Global Step: 64000   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:45:30,808-Speed 3016.67 samples/sec   Loss 10.5257   LearningRate 0.0551   Epoch: 5   Global Step: 64010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:45:34,137-Speed 3076.75 samples/sec   Loss 10.5586   LearningRate 0.0551   Epoch: 5   Global Step: 64020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:45:37,645-Speed 2920.46 samples/sec   Loss 10.6684   LearningRate 0.0551   Epoch: 5   Global Step: 64030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:45:41,041-Speed 3016.51 samples/sec   Loss 10.3975   LearningRate 0.0551   Epoch: 5   Global Step: 64040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:45:44,368-Speed 3078.19 samples/sec   Loss 10.5731   LearningRate 0.0551   Epoch: 5   Global Step: 64050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:45:47,790-Speed 2993.89 samples/sec   Loss 10.5381   LearningRate 0.0551   Epoch: 5   Global Step: 64060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:45:51,223-Speed 2983.33 samples/sec   Loss 10.4621   LearningRate 0.0551   Epoch: 5   Global Step: 64070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:45:54,655-Speed 2984.52 samples/sec   Loss 10.4919   LearningRate 0.0551   Epoch: 5   Global Step: 64080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:45:58,094-Speed 2978.77 samples/sec   Loss 10.5207   LearningRate 0.0551   Epoch: 5   Global Step: 64090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:46:01,471-Speed 3033.01 samples/sec   Loss 10.5132   LearningRate 0.0551   Epoch: 5   Global Step: 64100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:46:04,893-Speed 2993.60 samples/sec   Loss 10.4540   LearningRate 0.0550   Epoch: 5   Global Step: 64110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:46:08,284-Speed 3020.41 samples/sec   Loss 10.6119   LearningRate 0.0550   Epoch: 5   Global Step: 64120   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:46:11,742-Speed 2961.85 samples/sec   Loss 10.6403   LearningRate 0.0550   Epoch: 5   Global Step: 64130   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:46:15,251-Speed 2919.52 samples/sec   Loss 10.5548   LearningRate 0.0550   Epoch: 5   Global Step: 64140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:46:18,605-Speed 3053.39 samples/sec   Loss 10.5986   LearningRate 0.0550   Epoch: 5   Global Step: 64150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:46:21,921-Speed 3089.06 samples/sec   Loss 10.5812   LearningRate 0.0550   Epoch: 5   Global Step: 64160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:46:25,323-Speed 3011.15 samples/sec   Loss 10.4670   LearningRate 0.0550   Epoch: 5   Global Step: 64170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:46:28,764-Speed 2976.94 samples/sec   Loss 10.5403   LearningRate 0.0550   Epoch: 5   Global Step: 64180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:46:32,154-Speed 3021.80 samples/sec   Loss 10.6167   LearningRate 0.0550   Epoch: 5   Global Step: 64190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:46:35,590-Speed 2980.57 samples/sec   Loss 10.5670   LearningRate 0.0550   Epoch: 5   Global Step: 64200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:46:38,999-Speed 3005.21 samples/sec   Loss 10.6290   LearningRate 0.0550   Epoch: 5   Global Step: 64210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:46:42,319-Speed 3084.83 samples/sec   Loss 10.5219   LearningRate 0.0550   Epoch: 5   Global Step: 64220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:46:45,748-Speed 2987.08 samples/sec   Loss 10.5588   LearningRate 0.0550   Epoch: 5   Global Step: 64230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:46:49,106-Speed 3051.31 samples/sec   Loss 10.5975   LearningRate 0.0550   Epoch: 5   Global Step: 64240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:46:52,497-Speed 3020.56 samples/sec   Loss 10.6644   LearningRate 0.0550   Epoch: 5   Global Step: 64250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:46:55,880-Speed 3027.63 samples/sec   Loss 10.7322   LearningRate 0.0550   Epoch: 5   Global Step: 64260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:46:59,266-Speed 3025.56 samples/sec   Loss 10.5495   LearningRate 0.0550   Epoch: 5   Global Step: 64270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:47:02,587-Speed 3083.80 samples/sec   Loss 10.5599   LearningRate 0.0549   Epoch: 5   Global Step: 64280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:47:05,956-Speed 3040.31 samples/sec   Loss 10.7597   LearningRate 0.0549   Epoch: 5   Global Step: 64290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:47:09,298-Speed 3065.40 samples/sec   Loss 10.4944   LearningRate 0.0549   Epoch: 5   Global Step: 64300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:47:12,607-Speed 3095.41 samples/sec   Loss 10.7199   LearningRate 0.0549   Epoch: 5   Global Step: 64310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:47:16,112-Speed 2922.08 samples/sec   Loss 10.6751   LearningRate 0.0549   Epoch: 5   Global Step: 64320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:47:19,419-Speed 3097.27 samples/sec   Loss 10.5976   LearningRate 0.0549   Epoch: 5   Global Step: 64330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:47:22,774-Speed 3053.52 samples/sec   Loss 10.5441   LearningRate 0.0549   Epoch: 5   Global Step: 64340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:47:26,126-Speed 3055.65 samples/sec   Loss 10.6758   LearningRate 0.0549   Epoch: 5   Global Step: 64350   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:47:29,493-Speed 3042.46 samples/sec   Loss 10.6265   LearningRate 0.0549   Epoch: 5   Global Step: 64360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:47:32,877-Speed 3027.14 samples/sec   Loss 10.6009   LearningRate 0.0549   Epoch: 5   Global Step: 64370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:47:36,250-Speed 3036.02 samples/sec   Loss 10.5105   LearningRate 0.0549   Epoch: 5   Global Step: 64380   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-04-27 07:47:39,564-Speed 3091.70 samples/sec   Loss 10.7475   LearningRate 0.0549   Epoch: 5   Global Step: 64390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:47:42,958-Speed 3017.90 samples/sec   Loss 10.6747   LearningRate 0.0549   Epoch: 5   Global Step: 64400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:47:46,363-Speed 3007.84 samples/sec   Loss 10.5406   LearningRate 0.0549   Epoch: 5   Global Step: 64410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:47:49,754-Speed 3021.12 samples/sec   Loss 10.6328   LearningRate 0.0549   Epoch: 5   Global Step: 64420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:47:53,067-Speed 3091.76 samples/sec   Loss 10.5414   LearningRate 0.0549   Epoch: 5   Global Step: 64430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:47:56,439-Speed 3037.21 samples/sec   Loss 10.7676   LearningRate 0.0548   Epoch: 5   Global Step: 64440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:47:59,791-Speed 3056.24 samples/sec   Loss 10.5226   LearningRate 0.0548   Epoch: 5   Global Step: 64450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:48:03,128-Speed 3069.57 samples/sec   Loss 10.6756   LearningRate 0.0548   Epoch: 5   Global Step: 64460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:48:06,591-Speed 2957.34 samples/sec   Loss 10.5627   LearningRate 0.0548   Epoch: 5   Global Step: 64470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:48:09,984-Speed 3019.20 samples/sec   Loss 10.4827   LearningRate 0.0548   Epoch: 5   Global Step: 64480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:48:13,411-Speed 2989.16 samples/sec   Loss 10.6297   LearningRate 0.0548   Epoch: 5   Global Step: 64490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:48:16,827-Speed 2998.01 samples/sec   Loss 10.6500   LearningRate 0.0548   Epoch: 5   Global Step: 64500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 07:48:20,273-Speed 2973.14 samples/sec   Loss 10.6303   LearningRate 0.0548   Epoch: 5   Global Step: 64510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:48:23,584-Speed 3092.59 samples/sec   Loss 10.6915   LearningRate 0.0548   Epoch: 5   Global Step: 64520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:48:26,991-Speed 3006.83 samples/sec   Loss 10.5980   LearningRate 0.0548   Epoch: 5   Global Step: 64530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:48:30,436-Speed 2973.29 samples/sec   Loss 10.8288   LearningRate 0.0548   Epoch: 5   Global Step: 64540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:48:33,850-Speed 3000.13 samples/sec   Loss 10.6002   LearningRate 0.0548   Epoch: 5   Global Step: 64550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:48:37,200-Speed 3057.76 samples/sec   Loss 10.5768   LearningRate 0.0548   Epoch: 5   Global Step: 64560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:48:40,680-Speed 2943.23 samples/sec   Loss 10.6320   LearningRate 0.0548   Epoch: 5   Global Step: 64570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:48:44,077-Speed 3015.31 samples/sec   Loss 10.5357   LearningRate 0.0548   Epoch: 5   Global Step: 64580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:48:47,490-Speed 3001.72 samples/sec   Loss 10.6220   LearningRate 0.0548   Epoch: 5   Global Step: 64590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:48:50,901-Speed 3003.30 samples/sec   Loss 10.7337   LearningRate 0.0548   Epoch: 5   Global Step: 64600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:48:54,346-Speed 2972.88 samples/sec   Loss 10.6445   LearningRate 0.0547   Epoch: 5   Global Step: 64610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:48:57,690-Speed 3063.38 samples/sec   Loss 10.7399   LearningRate 0.0547   Epoch: 5   Global Step: 64620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:49:01,063-Speed 3036.99 samples/sec   Loss 10.7357   LearningRate 0.0547   Epoch: 5   Global Step: 64630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:49:04,460-Speed 3015.31 samples/sec   Loss 10.6797   LearningRate 0.0547   Epoch: 5   Global Step: 64640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:49:07,892-Speed 2984.50 samples/sec   Loss 10.6272   LearningRate 0.0547   Epoch: 5   Global Step: 64650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:49:11,265-Speed 3037.73 samples/sec   Loss 10.7166   LearningRate 0.0547   Epoch: 5   Global Step: 64660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:49:14,639-Speed 3035.64 samples/sec   Loss 10.6290   LearningRate 0.0547   Epoch: 5   Global Step: 64670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:49:18,038-Speed 3013.16 samples/sec   Loss 10.5954   LearningRate 0.0547   Epoch: 5   Global Step: 64680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:49:21,398-Speed 3049.51 samples/sec   Loss 10.6760   LearningRate 0.0547   Epoch: 5   Global Step: 64690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:49:24,854-Speed 2963.39 samples/sec   Loss 10.6961   LearningRate 0.0547   Epoch: 5   Global Step: 64700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:49:28,234-Speed 3030.82 samples/sec   Loss 10.6296   LearningRate 0.0547   Epoch: 5   Global Step: 64710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:49:31,636-Speed 3010.95 samples/sec   Loss 10.6559   LearningRate 0.0547   Epoch: 5   Global Step: 64720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:49:34,994-Speed 3049.71 samples/sec   Loss 10.6398   LearningRate 0.0547   Epoch: 5   Global Step: 64730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:49:38,448-Speed 2965.74 samples/sec   Loss 10.6571   LearningRate 0.0547   Epoch: 5   Global Step: 64740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:49:41,891-Speed 2974.71 samples/sec   Loss 10.7311   LearningRate 0.0547   Epoch: 5   Global Step: 64750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:49:45,241-Speed 3057.71 samples/sec   Loss 10.5822   LearningRate 0.0547   Epoch: 5   Global Step: 64760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:49:48,714-Speed 2949.42 samples/sec   Loss 10.6384   LearningRate 0.0547   Epoch: 5   Global Step: 64770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:49:52,171-Speed 2963.78 samples/sec   Loss 10.6765   LearningRate 0.0546   Epoch: 5   Global Step: 64780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:49:55,538-Speed 3041.61 samples/sec   Loss 10.5933   LearningRate 0.0546   Epoch: 5   Global Step: 64790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:49:58,911-Speed 3036.87 samples/sec   Loss 10.7680   LearningRate 0.0546   Epoch: 5   Global Step: 64800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:50:02,353-Speed 2976.46 samples/sec   Loss 10.7506   LearningRate 0.0546   Epoch: 5   Global Step: 64810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:50:05,781-Speed 2987.86 samples/sec   Loss 10.6981   LearningRate 0.0546   Epoch: 5   Global Step: 64820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:50:09,211-Speed 2985.93 samples/sec   Loss 10.5830   LearningRate 0.0546   Epoch: 5   Global Step: 64830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:50:12,625-Speed 3000.43 samples/sec   Loss 10.7202   LearningRate 0.0546   Epoch: 5   Global Step: 64840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:50:16,019-Speed 3017.80 samples/sec   Loss 10.7496   LearningRate 0.0546   Epoch: 5   Global Step: 64850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:50:19,424-Speed 3008.99 samples/sec   Loss 10.8133   LearningRate 0.0546   Epoch: 5   Global Step: 64860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:50:22,919-Speed 2930.51 samples/sec   Loss 10.6322   LearningRate 0.0546   Epoch: 5   Global Step: 64870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:50:26,249-Speed 3076.18 samples/sec   Loss 10.7574   LearningRate 0.0546   Epoch: 5   Global Step: 64880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:50:29,567-Speed 3087.29 samples/sec   Loss 10.9369   LearningRate 0.0546   Epoch: 5   Global Step: 64890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:50:32,934-Speed 3041.60 samples/sec   Loss 10.7420   LearningRate 0.0546   Epoch: 5   Global Step: 64900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:50:36,290-Speed 3052.02 samples/sec   Loss 10.6949   LearningRate 0.0546   Epoch: 5   Global Step: 64910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:50:39,640-Speed 3058.09 samples/sec   Loss 10.7330   LearningRate 0.0546   Epoch: 5   Global Step: 64920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:50:43,038-Speed 3014.16 samples/sec   Loss 10.8981   LearningRate 0.0546   Epoch: 5   Global Step: 64930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:50:46,418-Speed 3030.26 samples/sec   Loss 10.6026   LearningRate 0.0546   Epoch: 5   Global Step: 64940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:50:49,767-Speed 3058.65 samples/sec   Loss 10.4884   LearningRate 0.0545   Epoch: 5   Global Step: 64950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:50:53,155-Speed 3023.88 samples/sec   Loss 10.7805   LearningRate 0.0545   Epoch: 5   Global Step: 64960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:50:56,489-Speed 3072.42 samples/sec   Loss 10.6522   LearningRate 0.0545   Epoch: 5   Global Step: 64970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:50:59,867-Speed 3032.23 samples/sec   Loss 10.7310   LearningRate 0.0545   Epoch: 5   Global Step: 64980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:51:03,222-Speed 3053.72 samples/sec   Loss 10.6713   LearningRate 0.0545   Epoch: 5   Global Step: 64990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:51:06,620-Speed 3013.72 samples/sec   Loss 10.6581   LearningRate 0.0545   Epoch: 5   Global Step: 65000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:51:10,011-Speed 3021.22 samples/sec   Loss 10.6048   LearningRate 0.0545   Epoch: 5   Global Step: 65010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:51:13,376-Speed 3043.75 samples/sec   Loss 10.5077   LearningRate 0.0545   Epoch: 5   Global Step: 65020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:51:16,745-Speed 3040.64 samples/sec   Loss 10.6764   LearningRate 0.0545   Epoch: 5   Global Step: 65030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:51:20,175-Speed 2986.55 samples/sec   Loss 10.7924   LearningRate 0.0545   Epoch: 5   Global Step: 65040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:51:23,592-Speed 2997.49 samples/sec   Loss 10.8096   LearningRate 0.0545   Epoch: 5   Global Step: 65050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:51:27,044-Speed 2967.37 samples/sec   Loss 10.8178   LearningRate 0.0545   Epoch: 5   Global Step: 65060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:51:30,529-Speed 2939.23 samples/sec   Loss 10.6211   LearningRate 0.0545   Epoch: 5   Global Step: 65070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:51:33,894-Speed 3044.51 samples/sec   Loss 10.7413   LearningRate 0.0545   Epoch: 5   Global Step: 65080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:51:37,248-Speed 3053.52 samples/sec   Loss 10.7276   LearningRate 0.0545   Epoch: 5   Global Step: 65090   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:51:40,598-Speed 3057.62 samples/sec   Loss 10.7941   LearningRate 0.0545   Epoch: 5   Global Step: 65100   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:51:44,056-Speed 2962.62 samples/sec   Loss 10.6953   LearningRate 0.0545   Epoch: 5   Global Step: 65110   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:51:47,421-Speed 3043.51 samples/sec   Loss 10.7406   LearningRate 0.0544   Epoch: 5   Global Step: 65120   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:51:50,791-Speed 3039.92 samples/sec   Loss 10.6936   LearningRate 0.0544   Epoch: 5   Global Step: 65130   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:51:54,107-Speed 3089.35 samples/sec   Loss 10.7246   LearningRate 0.0544   Epoch: 5   Global Step: 65140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:51:57,492-Speed 3025.52 samples/sec   Loss 10.6966   LearningRate 0.0544   Epoch: 5   Global Step: 65150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:00,824-Speed 3074.68 samples/sec   Loss 10.6426   LearningRate 0.0544   Epoch: 5   Global Step: 65160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:04,163-Speed 3067.22 samples/sec   Loss 10.6004   LearningRate 0.0544   Epoch: 5   Global Step: 65170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:07,475-Speed 3093.26 samples/sec   Loss 10.8649   LearningRate 0.0544   Epoch: 5   Global Step: 65180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:10,762-Speed 3117.03 samples/sec   Loss 10.7255   LearningRate 0.0544   Epoch: 5   Global Step: 65190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:14,092-Speed 3075.12 samples/sec   Loss 10.7315   LearningRate 0.0544   Epoch: 5   Global Step: 65200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:17,521-Speed 2987.17 samples/sec   Loss 10.7873   LearningRate 0.0544   Epoch: 5   Global Step: 65210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:20,925-Speed 3009.40 samples/sec   Loss 10.6691   LearningRate 0.0544   Epoch: 5   Global Step: 65220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:24,347-Speed 2993.74 samples/sec   Loss 10.7433   LearningRate 0.0544   Epoch: 5   Global Step: 65230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:27,750-Speed 3009.59 samples/sec   Loss 10.7275   LearningRate 0.0544   Epoch: 5   Global Step: 65240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:31,102-Speed 3056.74 samples/sec   Loss 10.7189   LearningRate 0.0544   Epoch: 5   Global Step: 65250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:34,453-Speed 3056.26 samples/sec   Loss 10.7449   LearningRate 0.0544   Epoch: 5   Global Step: 65260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:37,831-Speed 3032.81 samples/sec   Loss 10.6309   LearningRate 0.0544   Epoch: 5   Global Step: 65270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:41,217-Speed 3024.51 samples/sec   Loss 10.5987   LearningRate 0.0543   Epoch: 5   Global Step: 65280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:44,574-Speed 3051.35 samples/sec   Loss 10.8643   LearningRate 0.0543   Epoch: 5   Global Step: 65290   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-04-27 07:52:47,888-Speed 3090.93 samples/sec   Loss 10.8580   LearningRate 0.0543   Epoch: 5   Global Step: 65300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:51,204-Speed 3089.06 samples/sec   Loss 10.6975   LearningRate 0.0543   Epoch: 5   Global Step: 65310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:54,608-Speed 3009.28 samples/sec   Loss 10.8619   LearningRate 0.0543   Epoch: 5   Global Step: 65320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:52:57,934-Speed 3079.47 samples/sec   Loss 10.7533   LearningRate 0.0543   Epoch: 5   Global Step: 65330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:53:01,347-Speed 3001.60 samples/sec   Loss 10.7662   LearningRate 0.0543   Epoch: 5   Global Step: 65340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:53:04,675-Speed 3077.89 samples/sec   Loss 10.7052   LearningRate 0.0543   Epoch: 5   Global Step: 65350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:53:08,109-Speed 2983.35 samples/sec   Loss 10.7921   LearningRate 0.0543   Epoch: 5   Global Step: 65360   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:53:11,470-Speed 3047.72 samples/sec   Loss 10.6530   LearningRate 0.0543   Epoch: 5   Global Step: 65370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:53:14,816-Speed 3060.79 samples/sec   Loss 10.5740   LearningRate 0.0543   Epoch: 5   Global Step: 65380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:53:18,131-Speed 3089.88 samples/sec   Loss 10.7836   LearningRate 0.0543   Epoch: 5   Global Step: 65390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:53:21,467-Speed 3070.32 samples/sec   Loss 10.7657   LearningRate 0.0543   Epoch: 5   Global Step: 65400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:53:24,844-Speed 3033.71 samples/sec   Loss 10.7137   LearningRate 0.0543   Epoch: 5   Global Step: 65410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:53:28,164-Speed 3085.02 samples/sec   Loss 10.6421   LearningRate 0.0543   Epoch: 5   Global Step: 65420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:53:31,620-Speed 2963.74 samples/sec   Loss 10.6673   LearningRate 0.0543   Epoch: 5   Global Step: 65430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:53:34,987-Speed 3042.57 samples/sec   Loss 10.6609   LearningRate 0.0543   Epoch: 5   Global Step: 65440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:53:38,359-Speed 3037.94 samples/sec   Loss 10.6990   LearningRate 0.0542   Epoch: 5   Global Step: 65450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:53:41,791-Speed 2984.76 samples/sec   Loss 10.7460   LearningRate 0.0542   Epoch: 5   Global Step: 65460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:53:45,252-Speed 2959.53 samples/sec   Loss 10.7478   LearningRate 0.0542   Epoch: 5   Global Step: 65470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:53:48,598-Speed 3060.93 samples/sec   Loss 10.7959   LearningRate 0.0542   Epoch: 5   Global Step: 65480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:53:51,897-Speed 3105.76 samples/sec   Loss 10.7891   LearningRate 0.0542   Epoch: 5   Global Step: 65490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:53:55,238-Speed 3066.30 samples/sec   Loss 10.7141   LearningRate 0.0542   Epoch: 5   Global Step: 65500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:53:58,689-Speed 2968.37 samples/sec   Loss 10.7715   LearningRate 0.0542   Epoch: 5   Global Step: 65510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:02,017-Speed 3077.63 samples/sec   Loss 10.7318   LearningRate 0.0542   Epoch: 5   Global Step: 65520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:05,402-Speed 3026.03 samples/sec   Loss 10.5821   LearningRate 0.0542   Epoch: 5   Global Step: 65530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:08,770-Speed 3041.36 samples/sec   Loss 10.6756   LearningRate 0.0542   Epoch: 5   Global Step: 65540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:12,188-Speed 2996.96 samples/sec   Loss 10.7050   LearningRate 0.0542   Epoch: 5   Global Step: 65550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:15,531-Speed 3063.32 samples/sec   Loss 10.7405   LearningRate 0.0542   Epoch: 5   Global Step: 65560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:18,870-Speed 3067.76 samples/sec   Loss 10.7285   LearningRate 0.0542   Epoch: 5   Global Step: 65570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:22,184-Speed 3090.94 samples/sec   Loss 10.6332   LearningRate 0.0542   Epoch: 5   Global Step: 65580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:25,571-Speed 3024.63 samples/sec   Loss 10.8082   LearningRate 0.0542   Epoch: 5   Global Step: 65590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:54:28,958-Speed 3023.52 samples/sec   Loss 10.8322   LearningRate 0.0542   Epoch: 5   Global Step: 65600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:54:32,374-Speed 2998.88 samples/sec   Loss 10.7941   LearningRate 0.0542   Epoch: 5   Global Step: 65610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:54:35,731-Speed 3050.86 samples/sec   Loss 10.6839   LearningRate 0.0541   Epoch: 5   Global Step: 65620   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:39,101-Speed 3039.82 samples/sec   Loss 10.6138   LearningRate 0.0541   Epoch: 5   Global Step: 65630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:42,461-Speed 3048.32 samples/sec   Loss 10.7012   LearningRate 0.0541   Epoch: 5   Global Step: 65640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:45,804-Speed 3063.72 samples/sec   Loss 10.6601   LearningRate 0.0541   Epoch: 5   Global Step: 65650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:49,133-Speed 3077.40 samples/sec   Loss 10.7972   LearningRate 0.0541   Epoch: 5   Global Step: 65660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:52,501-Speed 3040.95 samples/sec   Loss 10.7352   LearningRate 0.0541   Epoch: 5   Global Step: 65670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:55,840-Speed 3067.86 samples/sec   Loss 10.7868   LearningRate 0.0541   Epoch: 5   Global Step: 65680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:54:59,186-Speed 3061.23 samples/sec   Loss 10.7478   LearningRate 0.0541   Epoch: 5   Global Step: 65690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:55:02,537-Speed 3056.86 samples/sec   Loss 10.7550   LearningRate 0.0541   Epoch: 5   Global Step: 65700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:55:05,912-Speed 3034.94 samples/sec   Loss 10.6302   LearningRate 0.0541   Epoch: 5   Global Step: 65710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:55:09,300-Speed 3023.24 samples/sec   Loss 10.6196   LearningRate 0.0541   Epoch: 5   Global Step: 65720   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:55:12,638-Speed 3068.50 samples/sec   Loss 10.6415   LearningRate 0.0541   Epoch: 5   Global Step: 65730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:55:15,988-Speed 3057.50 samples/sec   Loss 10.6640   LearningRate 0.0541   Epoch: 5   Global Step: 65740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:55:19,474-Speed 2938.70 samples/sec   Loss 10.5794   LearningRate 0.0541   Epoch: 5   Global Step: 65750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:55:22,823-Speed 3058.70 samples/sec   Loss 10.7871   LearningRate 0.0541   Epoch: 5   Global Step: 65760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:55:26,207-Speed 3026.00 samples/sec   Loss 10.6960   LearningRate 0.0541   Epoch: 5   Global Step: 65770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:55:29,641-Speed 2983.39 samples/sec   Loss 10.8620   LearningRate 0.0541   Epoch: 5   Global Step: 65780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:55:33,135-Speed 2932.01 samples/sec   Loss 10.5894   LearningRate 0.0540   Epoch: 5   Global Step: 65790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:55:36,491-Speed 3052.04 samples/sec   Loss 10.8274   LearningRate 0.0540   Epoch: 5   Global Step: 65800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:55:39,939-Speed 2970.79 samples/sec   Loss 10.7344   LearningRate 0.0540   Epoch: 5   Global Step: 65810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:55:43,369-Speed 2986.25 samples/sec   Loss 10.8136   LearningRate 0.0540   Epoch: 5   Global Step: 65820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:55:46,757-Speed 3023.04 samples/sec   Loss 10.6432   LearningRate 0.0540   Epoch: 5   Global Step: 65830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:55:50,272-Speed 2914.44 samples/sec   Loss 10.7327   LearningRate 0.0540   Epoch: 5   Global Step: 65840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:55:53,633-Speed 3047.29 samples/sec   Loss 10.6697   LearningRate 0.0540   Epoch: 5   Global Step: 65850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:55:57,068-Speed 2981.25 samples/sec   Loss 10.7392   LearningRate 0.0540   Epoch: 5   Global Step: 65860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:56:00,560-Speed 2934.08 samples/sec   Loss 10.7960   LearningRate 0.0540   Epoch: 5   Global Step: 65870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:56:03,869-Speed 3095.68 samples/sec   Loss 10.6664   LearningRate 0.0540   Epoch: 5   Global Step: 65880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:56:07,281-Speed 3001.27 samples/sec   Loss 10.7440   LearningRate 0.0540   Epoch: 5   Global Step: 65890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:56:10,729-Speed 2971.43 samples/sec   Loss 10.7427   LearningRate 0.0540   Epoch: 5   Global Step: 65900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:56:14,248-Speed 2910.49 samples/sec   Loss 10.5801   LearningRate 0.0540   Epoch: 5   Global Step: 65910   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:56:17,607-Speed 3049.74 samples/sec   Loss 10.7318   LearningRate 0.0540   Epoch: 5   Global Step: 65920   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:56:21,029-Speed 2993.06 samples/sec   Loss 10.8733   LearningRate 0.0540   Epoch: 5   Global Step: 65930   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:56:24,371-Speed 3065.09 samples/sec   Loss 10.6831   LearningRate 0.0540   Epoch: 5   Global Step: 65940   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:56:27,750-Speed 3031.08 samples/sec   Loss 10.7688   LearningRate 0.0540   Epoch: 5   Global Step: 65950   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:56:31,096-Speed 3061.18 samples/sec   Loss 10.4862   LearningRate 0.0539   Epoch: 5   Global Step: 65960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:56:34,555-Speed 2961.61 samples/sec   Loss 10.7343   LearningRate 0.0539   Epoch: 5   Global Step: 65970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:56:37,929-Speed 3035.52 samples/sec   Loss 10.6260   LearningRate 0.0539   Epoch: 5   Global Step: 65980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:56:41,411-Speed 2941.66 samples/sec   Loss 10.7885   LearningRate 0.0539   Epoch: 5   Global Step: 65990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:56:44,860-Speed 2970.52 samples/sec   Loss 10.8577   LearningRate 0.0539   Epoch: 5   Global Step: 66000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:56:48,342-Speed 2941.29 samples/sec   Loss 10.7123   LearningRate 0.0539   Epoch: 5   Global Step: 66010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:56:51,720-Speed 3033.05 samples/sec   Loss 10.6951   LearningRate 0.0539   Epoch: 5   Global Step: 66020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:56:55,189-Speed 2952.56 samples/sec   Loss 10.7259   LearningRate 0.0539   Epoch: 5   Global Step: 66030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:56:58,547-Speed 3050.12 samples/sec   Loss 10.6306   LearningRate 0.0539   Epoch: 5   Global Step: 66040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:57:01,978-Speed 2986.23 samples/sec   Loss 10.7356   LearningRate 0.0539   Epoch: 5   Global Step: 66050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:57:05,445-Speed 2954.37 samples/sec   Loss 10.7567   LearningRate 0.0539   Epoch: 5   Global Step: 66060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:08,822-Speed 3033.68 samples/sec   Loss 10.7666   LearningRate 0.0539   Epoch: 5   Global Step: 66070   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:12,238-Speed 2998.20 samples/sec   Loss 10.7375   LearningRate 0.0539   Epoch: 5   Global Step: 66080   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:15,683-Speed 2973.11 samples/sec   Loss 10.5962   LearningRate 0.0539   Epoch: 5   Global Step: 66090   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:19,068-Speed 3026.33 samples/sec   Loss 10.7339   LearningRate 0.0539   Epoch: 5   Global Step: 66100   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:22,427-Speed 3050.15 samples/sec   Loss 10.8696   LearningRate 0.0539   Epoch: 5   Global Step: 66110   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:25,832-Speed 3007.66 samples/sec   Loss 10.7003   LearningRate 0.0539   Epoch: 5   Global Step: 66120   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:29,310-Speed 2944.62 samples/sec   Loss 10.7084   LearningRate 0.0538   Epoch: 5   Global Step: 66130   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:32,664-Speed 3054.52 samples/sec   Loss 10.7493   LearningRate 0.0538   Epoch: 5   Global Step: 66140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:36,053-Speed 3022.12 samples/sec   Loss 10.7715   LearningRate 0.0538   Epoch: 5   Global Step: 66150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:39,392-Speed 3067.87 samples/sec   Loss 10.5633   LearningRate 0.0538   Epoch: 5   Global Step: 66160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:42,723-Speed 3075.54 samples/sec   Loss 10.6867   LearningRate 0.0538   Epoch: 5   Global Step: 66170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:46,044-Speed 3084.27 samples/sec   Loss 10.7830   LearningRate 0.0538   Epoch: 5   Global Step: 66180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:49,460-Speed 2998.32 samples/sec   Loss 10.7308   LearningRate 0.0538   Epoch: 5   Global Step: 66190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:52,853-Speed 3018.59 samples/sec   Loss 10.7854   LearningRate 0.0538   Epoch: 5   Global Step: 66200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:56,262-Speed 3005.25 samples/sec   Loss 10.7370   LearningRate 0.0538   Epoch: 5   Global Step: 66210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:57:59,690-Speed 2987.66 samples/sec   Loss 10.8925   LearningRate 0.0538   Epoch: 5   Global Step: 66220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:58:03,088-Speed 3014.42 samples/sec   Loss 10.8577   LearningRate 0.0538   Epoch: 5   Global Step: 66230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:58:06,556-Speed 2953.69 samples/sec   Loss 10.7449   LearningRate 0.0538   Epoch: 5   Global Step: 66240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:58:09,945-Speed 3022.27 samples/sec   Loss 10.6715   LearningRate 0.0538   Epoch: 5   Global Step: 66250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:58:13,346-Speed 3011.26 samples/sec   Loss 10.8933   LearningRate 0.0538   Epoch: 5   Global Step: 66260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:58:16,701-Speed 3053.29 samples/sec   Loss 10.8231   LearningRate 0.0538   Epoch: 5   Global Step: 66270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:58:20,056-Speed 3053.35 samples/sec   Loss 10.7559   LearningRate 0.0538   Epoch: 5   Global Step: 66280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:58:23,436-Speed 3029.94 samples/sec   Loss 10.6013   LearningRate 0.0538   Epoch: 5   Global Step: 66290   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:58:26,782-Speed 3061.05 samples/sec   Loss 10.7963   LearningRate 0.0537   Epoch: 5   Global Step: 66300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:58:30,100-Speed 3088.52 samples/sec   Loss 10.6059   LearningRate 0.0537   Epoch: 5   Global Step: 66310   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:58:33,453-Speed 3054.91 samples/sec   Loss 10.6700   LearningRate 0.0537   Epoch: 5   Global Step: 66320   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:58:36,808-Speed 3053.44 samples/sec   Loss 10.6728   LearningRate 0.0537   Epoch: 5   Global Step: 66330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:58:40,204-Speed 3015.80 samples/sec   Loss 10.7267   LearningRate 0.0537   Epoch: 5   Global Step: 66340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:58:43,625-Speed 2994.71 samples/sec   Loss 10.8694   LearningRate 0.0537   Epoch: 5   Global Step: 66350   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:58:46,988-Speed 3045.25 samples/sec   Loss 10.7261   LearningRate 0.0537   Epoch: 5   Global Step: 66360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:58:50,363-Speed 3035.34 samples/sec   Loss 10.6919   LearningRate 0.0537   Epoch: 5   Global Step: 66370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:58:53,710-Speed 3059.70 samples/sec   Loss 10.7683   LearningRate 0.0537   Epoch: 5   Global Step: 66380   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:58:57,031-Speed 3084.42 samples/sec   Loss 10.7358   LearningRate 0.0537   Epoch: 5   Global Step: 66390   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:59:00,461-Speed 2986.08 samples/sec   Loss 10.7172   LearningRate 0.0537   Epoch: 5   Global Step: 66400   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:59:03,811-Speed 3058.12 samples/sec   Loss 10.6795   LearningRate 0.0537   Epoch: 5   Global Step: 66410   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:59:07,162-Speed 3056.38 samples/sec   Loss 10.8024   LearningRate 0.0537   Epoch: 5   Global Step: 66420   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:59:10,541-Speed 3032.11 samples/sec   Loss 10.6860   LearningRate 0.0537   Epoch: 5   Global Step: 66430   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:59:13,871-Speed 3075.86 samples/sec   Loss 10.8269   LearningRate 0.0537   Epoch: 5   Global Step: 66440   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:59:17,220-Speed 3059.32 samples/sec   Loss 10.7616   LearningRate 0.0537   Epoch: 5   Global Step: 66450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:59:20,650-Speed 2985.20 samples/sec   Loss 10.6471   LearningRate 0.0537   Epoch: 5   Global Step: 66460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:59:24,052-Speed 3011.01 samples/sec   Loss 10.7135   LearningRate 0.0536   Epoch: 5   Global Step: 66470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:59:27,414-Speed 3046.86 samples/sec   Loss 10.7473   LearningRate 0.0536   Epoch: 5   Global Step: 66480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:59:30,812-Speed 3014.43 samples/sec   Loss 10.7390   LearningRate 0.0536   Epoch: 5   Global Step: 66490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:59:34,224-Speed 3002.37 samples/sec   Loss 10.7288   LearningRate 0.0536   Epoch: 5   Global Step: 66500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 07:59:37,552-Speed 3077.20 samples/sec   Loss 10.6731   LearningRate 0.0536   Epoch: 5   Global Step: 66510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:59:40,976-Speed 2992.64 samples/sec   Loss 10.7567   LearningRate 0.0536   Epoch: 5   Global Step: 66520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:59:44,419-Speed 2974.84 samples/sec   Loss 10.6476   LearningRate 0.0536   Epoch: 5   Global Step: 66530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:59:47,929-Speed 2918.15 samples/sec   Loss 10.8218   LearningRate 0.0536   Epoch: 5   Global Step: 66540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:59:51,331-Speed 3010.41 samples/sec   Loss 10.7018   LearningRate 0.0536   Epoch: 5   Global Step: 66550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:59:54,751-Speed 2994.93 samples/sec   Loss 10.5815   LearningRate 0.0536   Epoch: 5   Global Step: 66560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 07:59:58,147-Speed 3016.57 samples/sec   Loss 10.7703   LearningRate 0.0536   Epoch: 5   Global Step: 66570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:00:01,593-Speed 2972.66 samples/sec   Loss 10.6692   LearningRate 0.0536   Epoch: 5   Global Step: 66580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:00:04,944-Speed 3056.81 samples/sec   Loss 10.8008   LearningRate 0.0536   Epoch: 5   Global Step: 66590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:00:08,299-Speed 3053.00 samples/sec   Loss 10.6768   LearningRate 0.0536   Epoch: 5   Global Step: 66600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:00:11,735-Speed 2981.02 samples/sec   Loss 10.6653   LearningRate 0.0536   Epoch: 5   Global Step: 66610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:00:15,148-Speed 3001.04 samples/sec   Loss 10.5833   LearningRate 0.0536   Epoch: 5   Global Step: 66620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:00:18,545-Speed 3016.27 samples/sec   Loss 10.6965   LearningRate 0.0536   Epoch: 5   Global Step: 66630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:00:21,897-Speed 3055.32 samples/sec   Loss 10.5915   LearningRate 0.0535   Epoch: 5   Global Step: 66640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:00:25,253-Speed 3052.46 samples/sec   Loss 10.7545   LearningRate 0.0535   Epoch: 5   Global Step: 66650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:00:28,646-Speed 3018.52 samples/sec   Loss 10.8225   LearningRate 0.0535   Epoch: 5   Global Step: 66660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:00:31,992-Speed 3061.11 samples/sec   Loss 10.7613   LearningRate 0.0535   Epoch: 5   Global Step: 66670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:00:35,375-Speed 3028.08 samples/sec   Loss 10.7635   LearningRate 0.0535   Epoch: 5   Global Step: 66680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:00:38,759-Speed 3027.05 samples/sec   Loss 10.8674   LearningRate 0.0535   Epoch: 5   Global Step: 66690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:00:42,143-Speed 3026.92 samples/sec   Loss 10.8175   LearningRate 0.0535   Epoch: 5   Global Step: 66700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:00:45,500-Speed 3050.74 samples/sec   Loss 10.7858   LearningRate 0.0535   Epoch: 5   Global Step: 66710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:00:48,854-Speed 3054.54 samples/sec   Loss 10.7428   LearningRate 0.0535   Epoch: 5   Global Step: 66720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:00:52,337-Speed 2940.29 samples/sec   Loss 10.9403   LearningRate 0.0535   Epoch: 5   Global Step: 66730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:00:55,724-Speed 3024.81 samples/sec   Loss 10.6059   LearningRate 0.0535   Epoch: 5   Global Step: 66740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:00:59,063-Speed 3067.24 samples/sec   Loss 10.7714   LearningRate 0.0535   Epoch: 5   Global Step: 66750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:01:02,437-Speed 3035.95 samples/sec   Loss 10.7440   LearningRate 0.0535   Epoch: 5   Global Step: 66760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:01:05,852-Speed 2999.15 samples/sec   Loss 10.6723   LearningRate 0.0535   Epoch: 5   Global Step: 66770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:01:09,204-Speed 3056.14 samples/sec   Loss 10.7563   LearningRate 0.0535   Epoch: 5   Global Step: 66780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:01:12,557-Speed 3054.66 samples/sec   Loss 10.7768   LearningRate 0.0535   Epoch: 5   Global Step: 66790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:01:15,896-Speed 3068.17 samples/sec   Loss 10.7478   LearningRate 0.0535   Epoch: 5   Global Step: 66800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:01:19,269-Speed 3036.59 samples/sec   Loss 10.7080   LearningRate 0.0534   Epoch: 5   Global Step: 66810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:01:22,614-Speed 3062.37 samples/sec   Loss 10.6169   LearningRate 0.0534   Epoch: 5   Global Step: 66820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:01:25,980-Speed 3043.60 samples/sec   Loss 10.6756   LearningRate 0.0534   Epoch: 5   Global Step: 66830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:01:29,351-Speed 3038.51 samples/sec   Loss 10.6408   LearningRate 0.0534   Epoch: 5   Global Step: 66840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:01:32,707-Speed 3052.12 samples/sec   Loss 10.7185   LearningRate 0.0534   Epoch: 5   Global Step: 66850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:01:36,147-Speed 2977.36 samples/sec   Loss 10.6914   LearningRate 0.0534   Epoch: 5   Global Step: 66860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:01:39,619-Speed 2949.97 samples/sec   Loss 10.7722   LearningRate 0.0534   Epoch: 5   Global Step: 66870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:01:43,016-Speed 3015.68 samples/sec   Loss 10.7421   LearningRate 0.0534   Epoch: 5   Global Step: 66880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:01:46,413-Speed 3015.18 samples/sec   Loss 10.7500   LearningRate 0.0534   Epoch: 5   Global Step: 66890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:01:49,784-Speed 3039.39 samples/sec   Loss 10.8280   LearningRate 0.0534   Epoch: 5   Global Step: 66900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:01:53,094-Speed 3094.27 samples/sec   Loss 10.7813   LearningRate 0.0534   Epoch: 5   Global Step: 66910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:01:56,458-Speed 3044.58 samples/sec   Loss 10.6742   LearningRate 0.0534   Epoch: 5   Global Step: 66920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:01:59,844-Speed 3024.93 samples/sec   Loss 10.6434   LearningRate 0.0534   Epoch: 5   Global Step: 66930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:02:03,287-Speed 2975.00 samples/sec   Loss 10.6642   LearningRate 0.0534   Epoch: 5   Global Step: 66940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:02:06,704-Speed 2998.04 samples/sec   Loss 10.8657   LearningRate 0.0534   Epoch: 5   Global Step: 66950   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:02:10,133-Speed 2986.58 samples/sec   Loss 10.7198   LearningRate 0.0534   Epoch: 5   Global Step: 66960   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:02:13,483-Speed 3057.70 samples/sec   Loss 10.6795   LearningRate 0.0534   Epoch: 5   Global Step: 66970   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:02:16,863-Speed 3030.53 samples/sec   Loss 10.6332   LearningRate 0.0533   Epoch: 5   Global Step: 66980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:02:20,246-Speed 3028.12 samples/sec   Loss 10.6551   LearningRate 0.0533   Epoch: 5   Global Step: 66990   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:02:23,658-Speed 3001.21 samples/sec   Loss 10.7208   LearningRate 0.0533   Epoch: 5   Global Step: 67000   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:02:27,127-Speed 2952.77 samples/sec   Loss 10.6986   LearningRate 0.0533   Epoch: 5   Global Step: 67010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:02:30,444-Speed 3088.27 samples/sec   Loss 10.7268   LearningRate 0.0533   Epoch: 5   Global Step: 67020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:02:33,796-Speed 3056.76 samples/sec   Loss 10.7955   LearningRate 0.0533   Epoch: 5   Global Step: 67030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:02:37,179-Speed 3026.90 samples/sec   Loss 10.5147   LearningRate 0.0533   Epoch: 5   Global Step: 67040   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:02:40,567-Speed 3023.42 samples/sec   Loss 10.6530   LearningRate 0.0533   Epoch: 5   Global Step: 67050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:02:43,919-Speed 3056.04 samples/sec   Loss 10.6570   LearningRate 0.0533   Epoch: 5   Global Step: 67060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:02:47,332-Speed 3001.51 samples/sec   Loss 10.7217   LearningRate 0.0533   Epoch: 5   Global Step: 67070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:02:50,668-Speed 3070.26 samples/sec   Loss 10.8263   LearningRate 0.0533   Epoch: 5   Global Step: 67080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:02:54,047-Speed 3031.46 samples/sec   Loss 10.8062   LearningRate 0.0533   Epoch: 5   Global Step: 67090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:02:57,414-Speed 3041.60 samples/sec   Loss 10.8308   LearningRate 0.0533   Epoch: 5   Global Step: 67100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:03:00,912-Speed 2928.51 samples/sec   Loss 10.8300   LearningRate 0.0533   Epoch: 5   Global Step: 67110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:03:04,267-Speed 3053.01 samples/sec   Loss 10.7318   LearningRate 0.0533   Epoch: 5   Global Step: 67120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:03:07,617-Speed 3057.85 samples/sec   Loss 10.6815   LearningRate 0.0533   Epoch: 5   Global Step: 67130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:03:10,996-Speed 3031.91 samples/sec   Loss 10.8334   LearningRate 0.0533   Epoch: 5   Global Step: 67140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:03:14,440-Speed 2974.06 samples/sec   Loss 10.6084   LearningRate 0.0532   Epoch: 5   Global Step: 67150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:03:17,832-Speed 3020.06 samples/sec   Loss 10.7531   LearningRate 0.0532   Epoch: 5   Global Step: 67160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:03:21,196-Speed 3044.87 samples/sec   Loss 10.7698   LearningRate 0.0532   Epoch: 5   Global Step: 67170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:03:24,544-Speed 3059.17 samples/sec   Loss 10.6388   LearningRate 0.0532   Epoch: 5   Global Step: 67180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:03:27,930-Speed 3025.05 samples/sec   Loss 10.6577   LearningRate 0.0532   Epoch: 5   Global Step: 67190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:03:31,264-Speed 3072.59 samples/sec   Loss 10.8266   LearningRate 0.0532   Epoch: 5   Global Step: 67200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:03:34,687-Speed 2992.21 samples/sec   Loss 10.7953   LearningRate 0.0532   Epoch: 5   Global Step: 67210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:03:38,041-Speed 3053.89 samples/sec   Loss 10.7433   LearningRate 0.0532   Epoch: 5   Global Step: 67220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:03:41,430-Speed 3022.58 samples/sec   Loss 10.5769   LearningRate 0.0532   Epoch: 5   Global Step: 67230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:03:44,824-Speed 3018.35 samples/sec   Loss 10.6782   LearningRate 0.0532   Epoch: 5   Global Step: 67240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:03:48,150-Speed 3079.91 samples/sec   Loss 10.6509   LearningRate 0.0532   Epoch: 5   Global Step: 67250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:03:51,476-Speed 3079.45 samples/sec   Loss 10.6387   LearningRate 0.0532   Epoch: 5   Global Step: 67260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:03:54,792-Speed 3089.19 samples/sec   Loss 10.7303   LearningRate 0.0532   Epoch: 5   Global Step: 67270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:03:58,152-Speed 3047.96 samples/sec   Loss 10.7297   LearningRate 0.0532   Epoch: 5   Global Step: 67280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:01,510-Speed 3050.36 samples/sec   Loss 10.6650   LearningRate 0.0532   Epoch: 5   Global Step: 67290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:04,894-Speed 3026.59 samples/sec   Loss 10.6303   LearningRate 0.0532   Epoch: 5   Global Step: 67300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:08,280-Speed 3025.75 samples/sec   Loss 10.7251   LearningRate 0.0532   Epoch: 5   Global Step: 67310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:11,708-Speed 2987.92 samples/sec   Loss 10.6078   LearningRate 0.0531   Epoch: 5   Global Step: 67320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:15,151-Speed 2974.66 samples/sec   Loss 10.7807   LearningRate 0.0531   Epoch: 5   Global Step: 67330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:18,521-Speed 3039.48 samples/sec   Loss 10.6142   LearningRate 0.0531   Epoch: 5   Global Step: 67340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:21,946-Speed 2990.26 samples/sec   Loss 10.5989   LearningRate 0.0531   Epoch: 5   Global Step: 67350   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-04-27 08:04:25,420-Speed 2949.89 samples/sec   Loss 10.7226   LearningRate 0.0531   Epoch: 5   Global Step: 67360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:28,788-Speed 3040.56 samples/sec   Loss 10.5936   LearningRate 0.0531   Epoch: 5   Global Step: 67370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:32,189-Speed 3011.54 samples/sec   Loss 10.6476   LearningRate 0.0531   Epoch: 5   Global Step: 67380   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:35,555-Speed 3042.96 samples/sec   Loss 10.7759   LearningRate 0.0531   Epoch: 5   Global Step: 67390   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:38,949-Speed 3019.05 samples/sec   Loss 10.7656   LearningRate 0.0531   Epoch: 5   Global Step: 67400   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:42,311-Speed 3046.20 samples/sec   Loss 10.6993   LearningRate 0.0531   Epoch: 5   Global Step: 67410   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:45,686-Speed 3035.01 samples/sec   Loss 10.4840   LearningRate 0.0531   Epoch: 5   Global Step: 67420   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:49,051-Speed 3044.27 samples/sec   Loss 10.7926   LearningRate 0.0531   Epoch: 5   Global Step: 67430   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:52,405-Speed 3053.82 samples/sec   Loss 10.7368   LearningRate 0.0531   Epoch: 5   Global Step: 67440   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:55,852-Speed 2971.65 samples/sec   Loss 10.5432   LearningRate 0.0531   Epoch: 5   Global Step: 67450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:04:59,298-Speed 2972.69 samples/sec   Loss 10.5743   LearningRate 0.0531   Epoch: 5   Global Step: 67460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:05:02,745-Speed 2971.47 samples/sec   Loss 10.7866   LearningRate 0.0531   Epoch: 5   Global Step: 67470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:05:06,124-Speed 3031.19 samples/sec   Loss 10.8294   LearningRate 0.0531   Epoch: 5   Global Step: 67480   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:05:09,541-Speed 2997.68 samples/sec   Loss 10.7359   LearningRate 0.0530   Epoch: 5   Global Step: 67490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:05:12,917-Speed 3034.05 samples/sec   Loss 10.9050   LearningRate 0.0530   Epoch: 5   Global Step: 67500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:05:16,278-Speed 3048.04 samples/sec   Loss 10.7911   LearningRate 0.0530   Epoch: 5   Global Step: 67510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:05:19,739-Speed 2959.52 samples/sec   Loss 10.6457   LearningRate 0.0530   Epoch: 5   Global Step: 67520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:05:23,149-Speed 3003.17 samples/sec   Loss 10.7635   LearningRate 0.0530   Epoch: 5   Global Step: 67530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:05:26,535-Speed 3025.77 samples/sec   Loss 10.5966   LearningRate 0.0530   Epoch: 5   Global Step: 67540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:05:30,011-Speed 2946.61 samples/sec   Loss 10.6967   LearningRate 0.0530   Epoch: 5   Global Step: 67550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:05:33,381-Speed 3039.57 samples/sec   Loss 10.7860   LearningRate 0.0530   Epoch: 5   Global Step: 67560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:05:36,814-Speed 2983.47 samples/sec   Loss 10.7720   LearningRate 0.0530   Epoch: 5   Global Step: 67570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:05:40,187-Speed 3036.72 samples/sec   Loss 10.7617   LearningRate 0.0530   Epoch: 5   Global Step: 67580   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:05:43,514-Speed 3078.80 samples/sec   Loss 10.7091   LearningRate 0.0530   Epoch: 5   Global Step: 67590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:05:46,892-Speed 3031.64 samples/sec   Loss 10.7164   LearningRate 0.0530   Epoch: 5   Global Step: 67600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:05:50,257-Speed 3044.48 samples/sec   Loss 10.7892   LearningRate 0.0530   Epoch: 5   Global Step: 67610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:05:53,672-Speed 2999.55 samples/sec   Loss 10.7900   LearningRate 0.0530   Epoch: 5   Global Step: 67620   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:05:57,084-Speed 3002.07 samples/sec   Loss 10.7176   LearningRate 0.0530   Epoch: 5   Global Step: 67630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:06:00,427-Speed 3063.66 samples/sec   Loss 10.5414   LearningRate 0.0530   Epoch: 5   Global Step: 67640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:06:03,878-Speed 2968.41 samples/sec   Loss 10.6004   LearningRate 0.0530   Epoch: 5   Global Step: 67650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:06:07,321-Speed 2974.75 samples/sec   Loss 10.8354   LearningRate 0.0529   Epoch: 5   Global Step: 67660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:06:10,655-Speed 3072.83 samples/sec   Loss 10.8447   LearningRate 0.0529   Epoch: 5   Global Step: 67670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:06:14,065-Speed 3003.25 samples/sec   Loss 10.6623   LearningRate 0.0529   Epoch: 5   Global Step: 67680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:06:17,402-Speed 3069.56 samples/sec   Loss 10.7494   LearningRate 0.0529   Epoch: 5   Global Step: 67690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:06:20,853-Speed 2968.60 samples/sec   Loss 10.7336   LearningRate 0.0529   Epoch: 5   Global Step: 67700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:06:24,206-Speed 3055.00 samples/sec   Loss 10.6503   LearningRate 0.0529   Epoch: 5   Global Step: 67710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:06:27,621-Speed 2999.39 samples/sec   Loss 10.8211   LearningRate 0.0529   Epoch: 5   Global Step: 67720   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:06:30,963-Speed 3064.83 samples/sec   Loss 10.6921   LearningRate 0.0529   Epoch: 5   Global Step: 67730   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:06:34,372-Speed 3004.73 samples/sec   Loss 10.7460   LearningRate 0.0529   Epoch: 5   Global Step: 67740   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:06:37,809-Speed 2979.60 samples/sec   Loss 10.6238   LearningRate 0.0529   Epoch: 5   Global Step: 67750   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:06:41,235-Speed 2989.55 samples/sec   Loss 10.8084   LearningRate 0.0529   Epoch: 5   Global Step: 67760   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:06:44,692-Speed 2963.79 samples/sec   Loss 10.7771   LearningRate 0.0529   Epoch: 5   Global Step: 67770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:06:48,149-Speed 2962.94 samples/sec   Loss 10.6448   LearningRate 0.0529   Epoch: 5   Global Step: 67780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:06:51,638-Speed 2936.07 samples/sec   Loss 10.7041   LearningRate 0.0529   Epoch: 5   Global Step: 67790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:06:55,086-Speed 2970.00 samples/sec   Loss 10.6903   LearningRate 0.0529   Epoch: 5   Global Step: 67800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:06:58,562-Speed 2946.72 samples/sec   Loss 10.7751   LearningRate 0.0529   Epoch: 5   Global Step: 67810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:01,909-Speed 3061.21 samples/sec   Loss 10.8012   LearningRate 0.0529   Epoch: 5   Global Step: 67820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:05,295-Speed 3024.75 samples/sec   Loss 10.4585   LearningRate 0.0528   Epoch: 5   Global Step: 67830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:08,717-Speed 2993.15 samples/sec   Loss 10.6626   LearningRate 0.0528   Epoch: 5   Global Step: 67840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:12,148-Speed 2985.62 samples/sec   Loss 10.6975   LearningRate 0.0528   Epoch: 5   Global Step: 67850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:15,552-Speed 3009.46 samples/sec   Loss 10.7986   LearningRate 0.0528   Epoch: 5   Global Step: 67860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:18,954-Speed 3010.57 samples/sec   Loss 10.6678   LearningRate 0.0528   Epoch: 5   Global Step: 67870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:22,340-Speed 3025.66 samples/sec   Loss 10.6437   LearningRate 0.0528   Epoch: 5   Global Step: 67880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:25,678-Speed 3068.52 samples/sec   Loss 10.6608   LearningRate 0.0528   Epoch: 5   Global Step: 67890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:29,093-Speed 2998.90 samples/sec   Loss 10.7001   LearningRate 0.0528   Epoch: 5   Global Step: 67900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:32,530-Speed 2980.57 samples/sec   Loss 10.6478   LearningRate 0.0528   Epoch: 5   Global Step: 67910   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:35,894-Speed 3044.96 samples/sec   Loss 10.7160   LearningRate 0.0528   Epoch: 5   Global Step: 67920   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:39,318-Speed 2991.72 samples/sec   Loss 10.8014   LearningRate 0.0528   Epoch: 5   Global Step: 67930   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:42,718-Speed 3012.56 samples/sec   Loss 10.7614   LearningRate 0.0528   Epoch: 5   Global Step: 67940   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:46,171-Speed 2966.43 samples/sec   Loss 10.7054   LearningRate 0.0528   Epoch: 5   Global Step: 67950   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:49,583-Speed 3001.46 samples/sec   Loss 10.6141   LearningRate 0.0528   Epoch: 5   Global Step: 67960   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:52,950-Speed 3042.25 samples/sec   Loss 10.6882   LearningRate 0.0528   Epoch: 5   Global Step: 67970   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:56,313-Speed 3045.67 samples/sec   Loss 10.7782   LearningRate 0.0528   Epoch: 5   Global Step: 67980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:07:59,735-Speed 2993.80 samples/sec   Loss 10.5843   LearningRate 0.0528   Epoch: 5   Global Step: 67990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:08:03,102-Speed 3041.98 samples/sec   Loss 10.7033   LearningRate 0.0527   Epoch: 5   Global Step: 68000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:08:06,508-Speed 3006.84 samples/sec   Loss 10.6143   LearningRate 0.0527   Epoch: 5   Global Step: 68010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:08:09,870-Speed 3047.41 samples/sec   Loss 10.5642   LearningRate 0.0527   Epoch: 5   Global Step: 68020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:08:13,228-Speed 3050.08 samples/sec   Loss 10.7817   LearningRate 0.0527   Epoch: 5   Global Step: 68030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:08:16,635-Speed 3006.04 samples/sec   Loss 10.6587   LearningRate 0.0527   Epoch: 5   Global Step: 68040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:08:20,000-Speed 3044.04 samples/sec   Loss 10.7572   LearningRate 0.0527   Epoch: 5   Global Step: 68050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:08:23,427-Speed 2989.38 samples/sec   Loss 10.6839   LearningRate 0.0527   Epoch: 5   Global Step: 68060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:08:26,839-Speed 3002.23 samples/sec   Loss 10.8045   LearningRate 0.0527   Epoch: 5   Global Step: 68070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:08:30,238-Speed 3013.46 samples/sec   Loss 10.7231   LearningRate 0.0527   Epoch: 5   Global Step: 68080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:08:33,657-Speed 2995.92 samples/sec   Loss 10.5848   LearningRate 0.0527   Epoch: 5   Global Step: 68090   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:08:37,076-Speed 2995.58 samples/sec   Loss 10.6903   LearningRate 0.0527   Epoch: 5   Global Step: 68100   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:08:40,431-Speed 3052.91 samples/sec   Loss 10.7271   LearningRate 0.0527   Epoch: 5   Global Step: 68110   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:08:43,861-Speed 2986.49 samples/sec   Loss 10.6996   LearningRate 0.0527   Epoch: 5   Global Step: 68120   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:08:47,255-Speed 3018.26 samples/sec   Loss 10.6010   LearningRate 0.0527   Epoch: 5   Global Step: 68130   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:08:50,639-Speed 3026.50 samples/sec   Loss 10.5644   LearningRate 0.0527   Epoch: 5   Global Step: 68140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:08:54,036-Speed 3015.50 samples/sec   Loss 10.6702   LearningRate 0.0527   Epoch: 5   Global Step: 68150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:08:57,464-Speed 2988.40 samples/sec   Loss 10.5686   LearningRate 0.0527   Epoch: 5   Global Step: 68160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:09:00,833-Speed 3040.16 samples/sec   Loss 10.6991   LearningRate 0.0526   Epoch: 5   Global Step: 68170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:09:04,211-Speed 3031.48 samples/sec   Loss 10.6700   LearningRate 0.0526   Epoch: 5   Global Step: 68180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:09:07,634-Speed 2993.03 samples/sec   Loss 10.6844   LearningRate 0.0526   Epoch: 5   Global Step: 68190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:09:11,123-Speed 2935.75 samples/sec   Loss 10.6550   LearningRate 0.0526   Epoch: 5   Global Step: 68200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:09:14,468-Speed 3061.98 samples/sec   Loss 10.7680   LearningRate 0.0526   Epoch: 5   Global Step: 68210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:09:17,845-Speed 3034.00 samples/sec   Loss 10.6750   LearningRate 0.0526   Epoch: 5   Global Step: 68220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:09:21,289-Speed 2974.30 samples/sec   Loss 10.6280   LearningRate 0.0526   Epoch: 5   Global Step: 68230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:09:24,745-Speed 2963.96 samples/sec   Loss 10.6984   LearningRate 0.0526   Epoch: 5   Global Step: 68240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:09:28,149-Speed 3009.09 samples/sec   Loss 10.6458   LearningRate 0.0526   Epoch: 5   Global Step: 68250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:09:31,629-Speed 2943.72 samples/sec   Loss 10.7140   LearningRate 0.0526   Epoch: 5   Global Step: 68260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:09:35,088-Speed 2961.18 samples/sec   Loss 10.6177   LearningRate 0.0526   Epoch: 5   Global Step: 68270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:09:38,415-Speed 3078.67 samples/sec   Loss 10.4955   LearningRate 0.0526   Epoch: 5   Global Step: 68280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:09:41,803-Speed 3023.77 samples/sec   Loss 10.5385   LearningRate 0.0526   Epoch: 5   Global Step: 68290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:09:45,168-Speed 3043.76 samples/sec   Loss 10.6832   LearningRate 0.0526   Epoch: 5   Global Step: 68300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:09:48,586-Speed 2996.83 samples/sec   Loss 10.5427   LearningRate 0.0526   Epoch: 5   Global Step: 68310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:09:51,991-Speed 3007.86 samples/sec   Loss 10.7618   LearningRate 0.0526   Epoch: 5   Global Step: 68320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:09:55,404-Speed 3001.80 samples/sec   Loss 10.6655   LearningRate 0.0526   Epoch: 5   Global Step: 68330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:09:58,749-Speed 3061.85 samples/sec   Loss 10.5728   LearningRate 0.0525   Epoch: 5   Global Step: 68340   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:10:02,156-Speed 3006.87 samples/sec   Loss 10.7355   LearningRate 0.0525   Epoch: 5   Global Step: 68350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:10:05,502-Speed 3060.76 samples/sec   Loss 10.6452   LearningRate 0.0525   Epoch: 5   Global Step: 68360   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:10:08,905-Speed 3010.34 samples/sec   Loss 10.6369   LearningRate 0.0525   Epoch: 5   Global Step: 68370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:10:12,337-Speed 2984.46 samples/sec   Loss 10.6657   LearningRate 0.0525   Epoch: 5   Global Step: 68380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:10:16,658-Speed 2370.70 samples/sec   Loss 10.7003   LearningRate 0.0525   Epoch: 5   Global Step: 68390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:10:20,042-Speed 3026.20 samples/sec   Loss 10.5975   LearningRate 0.0525   Epoch: 5   Global Step: 68400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:10:23,489-Speed 2972.36 samples/sec   Loss 10.6498   LearningRate 0.0525   Epoch: 5   Global Step: 68410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:10:26,940-Speed 2967.72 samples/sec   Loss 10.6962   LearningRate 0.0525   Epoch: 5   Global Step: 68420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:10:30,351-Speed 3003.09 samples/sec   Loss 10.5735   LearningRate 0.0525   Epoch: 5   Global Step: 68430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:10:33,731-Speed 3031.05 samples/sec   Loss 10.6633   LearningRate 0.0525   Epoch: 5   Global Step: 68440   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:10:37,057-Speed 3078.73 samples/sec   Loss 10.5704   LearningRate 0.0525   Epoch: 5   Global Step: 68450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:10:40,420-Speed 3045.94 samples/sec   Loss 10.7241   LearningRate 0.0525   Epoch: 5   Global Step: 68460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:10:43,751-Speed 3075.42 samples/sec   Loss 10.6937   LearningRate 0.0525   Epoch: 5   Global Step: 68470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:10:47,138-Speed 3023.63 samples/sec   Loss 10.6491   LearningRate 0.0525   Epoch: 5   Global Step: 68480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:10:50,558-Speed 2995.55 samples/sec   Loss 10.5797   LearningRate 0.0525   Epoch: 5   Global Step: 68490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:10:53,886-Speed 3077.45 samples/sec   Loss 10.4848   LearningRate 0.0525   Epoch: 5   Global Step: 68500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:10:57,196-Speed 3096.23 samples/sec   Loss 10.7289   LearningRate 0.0524   Epoch: 5   Global Step: 68510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:11:00,592-Speed 3015.82 samples/sec   Loss 10.5722   LearningRate 0.0524   Epoch: 5   Global Step: 68520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:11:03,934-Speed 3065.43 samples/sec   Loss 10.6418   LearningRate 0.0524   Epoch: 5   Global Step: 68530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:11:07,281-Speed 3060.69 samples/sec   Loss 10.4871   LearningRate 0.0524   Epoch: 5   Global Step: 68540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:11:10,760-Speed 2944.14 samples/sec   Loss 10.6076   LearningRate 0.0524   Epoch: 5   Global Step: 68550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:11:14,124-Speed 3044.56 samples/sec   Loss 10.5683   LearningRate 0.0524   Epoch: 5   Global Step: 68560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:11:17,519-Speed 3017.34 samples/sec   Loss 10.5925   LearningRate 0.0524   Epoch: 5   Global Step: 68570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:11:20,873-Speed 3054.06 samples/sec   Loss 10.6176   LearningRate 0.0524   Epoch: 5   Global Step: 68580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:11:24,230-Speed 3050.78 samples/sec   Loss 10.6667   LearningRate 0.0524   Epoch: 5   Global Step: 68590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:11:27,608-Speed 3033.82 samples/sec   Loss 10.7471   LearningRate 0.0524   Epoch: 5   Global Step: 68600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:11:30,972-Speed 3044.18 samples/sec   Loss 10.5124   LearningRate 0.0524   Epoch: 5   Global Step: 68610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:11:34,367-Speed 3017.14 samples/sec   Loss 10.6673   LearningRate 0.0524   Epoch: 5   Global Step: 68620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:11:37,773-Speed 3007.83 samples/sec   Loss 10.7556   LearningRate 0.0524   Epoch: 5   Global Step: 68630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:11:41,136-Speed 3046.16 samples/sec   Loss 10.5951   LearningRate 0.0524   Epoch: 5   Global Step: 68640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:11:44,516-Speed 3030.70 samples/sec   Loss 10.6359   LearningRate 0.0524   Epoch: 5   Global Step: 68650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:11:47,882-Speed 3042.38 samples/sec   Loss 10.6129   LearningRate 0.0524   Epoch: 5   Global Step: 68660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:11:51,292-Speed 3004.60 samples/sec   Loss 10.7106   LearningRate 0.0524   Epoch: 5   Global Step: 68670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:11:54,666-Speed 3035.22 samples/sec   Loss 10.7844   LearningRate 0.0523   Epoch: 5   Global Step: 68680   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:11:58,062-Speed 3016.77 samples/sec   Loss 10.6647   LearningRate 0.0523   Epoch: 5   Global Step: 68690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:12:01,423-Speed 3047.86 samples/sec   Loss 10.4793   LearningRate 0.0523   Epoch: 5   Global Step: 68700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:12:04,818-Speed 3016.67 samples/sec   Loss 10.6465   LearningRate 0.0523   Epoch: 5   Global Step: 68710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:12:08,167-Speed 3058.86 samples/sec   Loss 10.7804   LearningRate 0.0523   Epoch: 5   Global Step: 68720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:12:11,537-Speed 3039.03 samples/sec   Loss 10.5121   LearningRate 0.0523   Epoch: 5   Global Step: 68730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:12:14,941-Speed 3009.39 samples/sec   Loss 10.9000   LearningRate 0.0523   Epoch: 5   Global Step: 68740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:12:18,396-Speed 2964.65 samples/sec   Loss 10.6993   LearningRate 0.0523   Epoch: 5   Global Step: 68750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:12:21,757-Speed 3047.79 samples/sec   Loss 10.7276   LearningRate 0.0523   Epoch: 5   Global Step: 68760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:12:25,113-Speed 3051.93 samples/sec   Loss 10.6693   LearningRate 0.0523   Epoch: 5   Global Step: 68770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:12:28,468-Speed 3054.60 samples/sec   Loss 10.5889   LearningRate 0.0523   Epoch: 5   Global Step: 68780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:12:31,926-Speed 2962.23 samples/sec   Loss 10.6015   LearningRate 0.0523   Epoch: 5   Global Step: 68790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:12:35,304-Speed 3033.32 samples/sec   Loss 10.5562   LearningRate 0.0523   Epoch: 5   Global Step: 68800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:12:38,650-Speed 3060.56 samples/sec   Loss 10.6508   LearningRate 0.0523   Epoch: 5   Global Step: 68810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:12:42,014-Speed 3044.62 samples/sec   Loss 10.7257   LearningRate 0.0523   Epoch: 5   Global Step: 68820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:12:45,372-Speed 3050.25 samples/sec   Loss 10.5395   LearningRate 0.0523   Epoch: 5   Global Step: 68830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:12:48,780-Speed 3006.19 samples/sec   Loss 10.5668   LearningRate 0.0523   Epoch: 5   Global Step: 68840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:12:52,177-Speed 3014.98 samples/sec   Loss 10.4655   LearningRate 0.0523   Epoch: 5   Global Step: 68850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:12:55,510-Speed 3072.83 samples/sec   Loss 10.6507   LearningRate 0.0522   Epoch: 5   Global Step: 68860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:12:58,903-Speed 3020.20 samples/sec   Loss 10.6027   LearningRate 0.0522   Epoch: 5   Global Step: 68870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:13:02,285-Speed 3028.99 samples/sec   Loss 10.5544   LearningRate 0.0522   Epoch: 5   Global Step: 68880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:13:05,754-Speed 2952.38 samples/sec   Loss 10.6270   LearningRate 0.0522   Epoch: 5   Global Step: 68890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:13:09,172-Speed 2996.92 samples/sec   Loss 10.5423   LearningRate 0.0522   Epoch: 5   Global Step: 68900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:13:12,572-Speed 3013.03 samples/sec   Loss 10.6322   LearningRate 0.0522   Epoch: 5   Global Step: 68910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:13:15,999-Speed 2988.56 samples/sec   Loss 10.6773   LearningRate 0.0522   Epoch: 5   Global Step: 68920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:13:19,484-Speed 2939.09 samples/sec   Loss 10.7705   LearningRate 0.0522   Epoch: 5   Global Step: 68930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:13:22,904-Speed 2995.25 samples/sec   Loss 10.6132   LearningRate 0.0522   Epoch: 5   Global Step: 68940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:13:26,337-Speed 2983.78 samples/sec   Loss 10.6784   LearningRate 0.0522   Epoch: 5   Global Step: 68950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:13:29,691-Speed 3053.84 samples/sec   Loss 10.8037   LearningRate 0.0522   Epoch: 5   Global Step: 68960   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:13:33,052-Speed 3047.25 samples/sec   Loss 10.5606   LearningRate 0.0522   Epoch: 5   Global Step: 68970   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:13:36,470-Speed 2997.48 samples/sec   Loss 10.5491   LearningRate 0.0522   Epoch: 5   Global Step: 68980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:13:39,826-Speed 3052.26 samples/sec   Loss 10.5792   LearningRate 0.0522   Epoch: 5   Global Step: 68990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:13:43,184-Speed 3049.85 samples/sec   Loss 10.6236   LearningRate 0.0522   Epoch: 5   Global Step: 69000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:13:46,555-Speed 3038.75 samples/sec   Loss 10.7673   LearningRate 0.0522   Epoch: 5   Global Step: 69010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:13:49,867-Speed 3092.96 samples/sec   Loss 10.4815   LearningRate 0.0522   Epoch: 5   Global Step: 69020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:13:53,208-Speed 3065.61 samples/sec   Loss 10.6153   LearningRate 0.0521   Epoch: 5   Global Step: 69030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:13:56,617-Speed 3006.32 samples/sec   Loss 10.6084   LearningRate 0.0521   Epoch: 5   Global Step: 69040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:14:00,029-Speed 3001.84 samples/sec   Loss 10.6476   LearningRate 0.0521   Epoch: 5   Global Step: 69050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:14:03,406-Speed 3033.19 samples/sec   Loss 10.6508   LearningRate 0.0521   Epoch: 5   Global Step: 69060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:14:06,847-Speed 2976.61 samples/sec   Loss 10.7111   LearningRate 0.0521   Epoch: 5   Global Step: 69070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:14:10,166-Speed 3086.03 samples/sec   Loss 10.6233   LearningRate 0.0521   Epoch: 5   Global Step: 69080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:14:13,569-Speed 3011.30 samples/sec   Loss 10.4203   LearningRate 0.0521   Epoch: 5   Global Step: 69090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:14:16,915-Speed 3061.48 samples/sec   Loss 10.6765   LearningRate 0.0521   Epoch: 5   Global Step: 69100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:14:20,284-Speed 3039.60 samples/sec   Loss 10.7655   LearningRate 0.0521   Epoch: 5   Global Step: 69110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:14:23,667-Speed 3028.16 samples/sec   Loss 10.7093   LearningRate 0.0521   Epoch: 5   Global Step: 69120   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:14:27,093-Speed 2989.75 samples/sec   Loss 10.6327   LearningRate 0.0521   Epoch: 5   Global Step: 69130   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:14:30,479-Speed 3025.19 samples/sec   Loss 10.5872   LearningRate 0.0521   Epoch: 5   Global Step: 69140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:14:33,848-Speed 3040.22 samples/sec   Loss 10.5254   LearningRate 0.0521   Epoch: 5   Global Step: 69150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:14:37,311-Speed 2957.51 samples/sec   Loss 10.8188   LearningRate 0.0521   Epoch: 5   Global Step: 69160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:14:40,633-Speed 3083.22 samples/sec   Loss 10.5811   LearningRate 0.0521   Epoch: 5   Global Step: 69170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:14:43,987-Speed 3054.00 samples/sec   Loss 10.6571   LearningRate 0.0521   Epoch: 5   Global Step: 69180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:14:47,363-Speed 3034.84 samples/sec   Loss 10.6873   LearningRate 0.0521   Epoch: 5   Global Step: 69190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:14:50,764-Speed 3011.66 samples/sec   Loss 10.5188   LearningRate 0.0520   Epoch: 5   Global Step: 69200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:14:54,219-Speed 2964.31 samples/sec   Loss 10.5958   LearningRate 0.0520   Epoch: 5   Global Step: 69210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:14:57,537-Speed 3087.36 samples/sec   Loss 10.5619   LearningRate 0.0520   Epoch: 5   Global Step: 69220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:00,936-Speed 3013.68 samples/sec   Loss 10.5571   LearningRate 0.0520   Epoch: 5   Global Step: 69230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:04,373-Speed 2979.71 samples/sec   Loss 10.5578   LearningRate 0.0520   Epoch: 5   Global Step: 69240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:07,790-Speed 2998.16 samples/sec   Loss 10.7596   LearningRate 0.0520   Epoch: 5   Global Step: 69250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:11,196-Speed 3006.96 samples/sec   Loss 10.7229   LearningRate 0.0520   Epoch: 5   Global Step: 69260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:14,586-Speed 3021.41 samples/sec   Loss 10.5617   LearningRate 0.0520   Epoch: 5   Global Step: 69270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:17,923-Speed 3069.50 samples/sec   Loss 10.6525   LearningRate 0.0520   Epoch: 5   Global Step: 69280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:21,333-Speed 3004.11 samples/sec   Loss 10.7201   LearningRate 0.0520   Epoch: 5   Global Step: 69290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:24,767-Speed 2982.14 samples/sec   Loss 10.4245   LearningRate 0.0520   Epoch: 5   Global Step: 69300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:28,230-Speed 2957.94 samples/sec   Loss 10.6497   LearningRate 0.0520   Epoch: 5   Global Step: 69310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:31,675-Speed 2973.60 samples/sec   Loss 10.7042   LearningRate 0.0520   Epoch: 5   Global Step: 69320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:35,043-Speed 3041.15 samples/sec   Loss 10.5257   LearningRate 0.0520   Epoch: 5   Global Step: 69330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:38,544-Speed 2925.83 samples/sec   Loss 10.7536   LearningRate 0.0520   Epoch: 5   Global Step: 69340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:41,974-Speed 2986.33 samples/sec   Loss 10.5048   LearningRate 0.0520   Epoch: 5   Global Step: 69350   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:45,362-Speed 3023.45 samples/sec   Loss 10.6968   LearningRate 0.0520   Epoch: 5   Global Step: 69360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:48,721-Speed 3048.91 samples/sec   Loss 10.4959   LearningRate 0.0519   Epoch: 5   Global Step: 69370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:52,216-Speed 2930.97 samples/sec   Loss 10.6162   LearningRate 0.0519   Epoch: 5   Global Step: 69380   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:55,624-Speed 3005.45 samples/sec   Loss 10.6434   LearningRate 0.0519   Epoch: 5   Global Step: 69390   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:15:59,011-Speed 3024.63 samples/sec   Loss 10.5131   LearningRate 0.0519   Epoch: 5   Global Step: 69400   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:02,482-Speed 2950.55 samples/sec   Loss 10.6165   LearningRate 0.0519   Epoch: 5   Global Step: 69410   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:05,817-Speed 3071.61 samples/sec   Loss 10.6021   LearningRate 0.0519   Epoch: 5   Global Step: 69420   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:09,228-Speed 3002.83 samples/sec   Loss 10.4905   LearningRate 0.0519   Epoch: 5   Global Step: 69430   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:12,630-Speed 3011.24 samples/sec   Loss 10.8399   LearningRate 0.0519   Epoch: 5   Global Step: 69440   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:16,081-Speed 2967.92 samples/sec   Loss 10.6232   LearningRate 0.0519   Epoch: 5   Global Step: 69450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:19,438-Speed 3051.52 samples/sec   Loss 10.4809   LearningRate 0.0519   Epoch: 5   Global Step: 69460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:22,851-Speed 3001.19 samples/sec   Loss 10.6663   LearningRate 0.0519   Epoch: 5   Global Step: 69470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:26,234-Speed 3028.25 samples/sec   Loss 10.4670   LearningRate 0.0519   Epoch: 5   Global Step: 69480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:29,660-Speed 2989.09 samples/sec   Loss 10.5656   LearningRate 0.0519   Epoch: 5   Global Step: 69490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:33,077-Speed 2999.23 samples/sec   Loss 10.5798   LearningRate 0.0519   Epoch: 5   Global Step: 69500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:36,498-Speed 2994.63 samples/sec   Loss 10.7109   LearningRate 0.0519   Epoch: 5   Global Step: 69510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:39,894-Speed 3015.58 samples/sec   Loss 10.6718   LearningRate 0.0519   Epoch: 5   Global Step: 69520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:43,290-Speed 3017.24 samples/sec   Loss 10.6079   LearningRate 0.0519   Epoch: 5   Global Step: 69530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:46,709-Speed 2995.78 samples/sec   Loss 10.5118   LearningRate 0.0519   Epoch: 5   Global Step: 69540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:50,155-Speed 2971.95 samples/sec   Loss 10.5524   LearningRate 0.0518   Epoch: 5   Global Step: 69550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:53,533-Speed 3032.50 samples/sec   Loss 10.5674   LearningRate 0.0518   Epoch: 5   Global Step: 69560   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:16:56,978-Speed 2973.60 samples/sec   Loss 10.6461   LearningRate 0.0518   Epoch: 5   Global Step: 69570   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:17:00,426-Speed 2970.47 samples/sec   Loss 10.5011   LearningRate 0.0518   Epoch: 5   Global Step: 69580   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:17:03,789-Speed 3045.64 samples/sec   Loss 10.6868   LearningRate 0.0518   Epoch: 5   Global Step: 69590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:17:07,221-Speed 2984.87 samples/sec   Loss 10.5245   LearningRate 0.0518   Epoch: 5   Global Step: 69600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:17:10,624-Speed 3010.34 samples/sec   Loss 10.6743   LearningRate 0.0518   Epoch: 5   Global Step: 69610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:17:13,980-Speed 3051.93 samples/sec   Loss 10.6838   LearningRate 0.0518   Epoch: 5   Global Step: 69620   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-04-27 08:17:17,383-Speed 3009.59 samples/sec   Loss 10.6226   LearningRate 0.0518   Epoch: 5   Global Step: 69630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:17:20,733-Speed 3058.23 samples/sec   Loss 10.6551   LearningRate 0.0518   Epoch: 5   Global Step: 69640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:17:24,110-Speed 3032.94 samples/sec   Loss 10.6634   LearningRate 0.0518   Epoch: 5   Global Step: 69650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:17:27,416-Speed 3098.01 samples/sec   Loss 10.6472   LearningRate 0.0518   Epoch: 5   Global Step: 69660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:17:30,834-Speed 2997.36 samples/sec   Loss 10.4444   LearningRate 0.0518   Epoch: 5   Global Step: 69670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:17:34,267-Speed 2983.29 samples/sec   Loss 10.5852   LearningRate 0.0518   Epoch: 5   Global Step: 69680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:17:37,647-Speed 3031.23 samples/sec   Loss 10.5666   LearningRate 0.0518   Epoch: 5   Global Step: 69690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:17:41,036-Speed 3022.12 samples/sec   Loss 10.6151   LearningRate 0.0518   Epoch: 5   Global Step: 69700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:17:44,351-Speed 3090.31 samples/sec   Loss 10.6790   LearningRate 0.0518   Epoch: 5   Global Step: 69710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:17:47,817-Speed 2955.34 samples/sec   Loss 10.6724   LearningRate 0.0517   Epoch: 5   Global Step: 69720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:17:51,270-Speed 2966.11 samples/sec   Loss 10.6730   LearningRate 0.0517   Epoch: 5   Global Step: 69730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:17:54,663-Speed 3019.73 samples/sec   Loss 10.5308   LearningRate 0.0517   Epoch: 5   Global Step: 69740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:17:58,067-Speed 3009.06 samples/sec   Loss 10.5820   LearningRate 0.0517   Epoch: 5   Global Step: 69750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:18:01,450-Speed 3027.74 samples/sec   Loss 10.6233   LearningRate 0.0517   Epoch: 5   Global Step: 69760   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:18:04,868-Speed 2996.68 samples/sec   Loss 10.4945   LearningRate 0.0517   Epoch: 5   Global Step: 69770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:18:08,239-Speed 3038.53 samples/sec   Loss 10.5453   LearningRate 0.0517   Epoch: 5   Global Step: 69780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:18:11,646-Speed 3007.49 samples/sec   Loss 10.4664   LearningRate 0.0517   Epoch: 5   Global Step: 69790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:18:15,011-Speed 3043.92 samples/sec   Loss 10.5945   LearningRate 0.0517   Epoch: 5   Global Step: 69800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:18:18,429-Speed 2996.26 samples/sec   Loss 10.4083   LearningRate 0.0517   Epoch: 5   Global Step: 69810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:18:21,811-Speed 3028.44 samples/sec   Loss 10.4149   LearningRate 0.0517   Epoch: 5   Global Step: 69820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:18:25,318-Speed 2921.43 samples/sec   Loss 10.5983   LearningRate 0.0517   Epoch: 5   Global Step: 69830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:18:28,746-Speed 2987.31 samples/sec   Loss 10.4799   LearningRate 0.0517   Epoch: 5   Global Step: 69840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:18:32,118-Speed 3037.96 samples/sec   Loss 10.5334   LearningRate 0.0517   Epoch: 5   Global Step: 69850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:18:35,535-Speed 2997.84 samples/sec   Loss 10.4926   LearningRate 0.0517   Epoch: 5   Global Step: 69860   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-04-27 08:18:38,938-Speed 3010.31 samples/sec   Loss 10.6635   LearningRate 0.0517   Epoch: 5   Global Step: 69870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:18:42,310-Speed 3037.38 samples/sec   Loss 10.5801   LearningRate 0.0517   Epoch: 5   Global Step: 69880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:18:45,729-Speed 2995.78 samples/sec   Loss 10.5335   LearningRate 0.0516   Epoch: 5   Global Step: 69890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:18:49,091-Speed 3046.08 samples/sec   Loss 10.7556   LearningRate 0.0516   Epoch: 5   Global Step: 69900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:18:52,442-Speed 3057.66 samples/sec   Loss 10.6249   LearningRate 0.0516   Epoch: 5   Global Step: 69910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:18:55,841-Speed 3013.65 samples/sec   Loss 10.5901   LearningRate 0.0516   Epoch: 5   Global Step: 69920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:18:59,209-Speed 3041.26 samples/sec   Loss 10.5197   LearningRate 0.0516   Epoch: 5   Global Step: 69930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:19:02,620-Speed 3003.02 samples/sec   Loss 10.5370   LearningRate 0.0516   Epoch: 5   Global Step: 69940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:19:05,975-Speed 3052.84 samples/sec   Loss 10.4817   LearningRate 0.0516   Epoch: 5   Global Step: 69950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:19:09,403-Speed 2988.83 samples/sec   Loss 10.5898   LearningRate 0.0516   Epoch: 5   Global Step: 69960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:19:12,864-Speed 2959.26 samples/sec   Loss 10.6417   LearningRate 0.0516   Epoch: 5   Global Step: 69970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:19:16,356-Speed 2932.73 samples/sec   Loss 10.5264   LearningRate 0.0516   Epoch: 5   Global Step: 69980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:19:19,771-Speed 2999.35 samples/sec   Loss 10.4215   LearningRate 0.0516   Epoch: 5   Global Step: 69990   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:19:23,208-Speed 2980.82 samples/sec   Loss 10.5037   LearningRate 0.0516   Epoch: 5   Global Step: 70000   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:19:26,578-Speed 3039.13 samples/sec   Loss 10.5840   LearningRate 0.0516   Epoch: 5   Global Step: 70010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:19:29,952-Speed 3035.59 samples/sec   Loss 10.7062   LearningRate 0.0516   Epoch: 5   Global Step: 70020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:19:33,319-Speed 3042.43 samples/sec   Loss 10.5122   LearningRate 0.0516   Epoch: 5   Global Step: 70030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:19:36,802-Speed 2941.10 samples/sec   Loss 10.6049   LearningRate 0.0516   Epoch: 5   Global Step: 70040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:19:40,167-Speed 3044.41 samples/sec   Loss 10.5476   LearningRate 0.0516   Epoch: 5   Global Step: 70050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:19:43,573-Speed 3006.60 samples/sec   Loss 10.6239   LearningRate 0.0515   Epoch: 5   Global Step: 70060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:19:47,039-Speed 2956.20 samples/sec   Loss 10.5605   LearningRate 0.0515   Epoch: 5   Global Step: 70070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:19:50,368-Speed 3076.09 samples/sec   Loss 10.5203   LearningRate 0.0515   Epoch: 5   Global Step: 70080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:19:53,730-Speed 3048.16 samples/sec   Loss 10.4862   LearningRate 0.0515   Epoch: 5   Global Step: 70090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:19:57,188-Speed 2962.53 samples/sec   Loss 10.5130   LearningRate 0.0515   Epoch: 5   Global Step: 70100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:20:00,575-Speed 3024.22 samples/sec   Loss 10.7845   LearningRate 0.0515   Epoch: 5   Global Step: 70110   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:20:03,941-Speed 3042.88 samples/sec   Loss 10.5947   LearningRate 0.0515   Epoch: 5   Global Step: 70120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:20:07,282-Speed 3065.95 samples/sec   Loss 10.3987   LearningRate 0.0515   Epoch: 5   Global Step: 70130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:20:10,657-Speed 3035.42 samples/sec   Loss 10.4712   LearningRate 0.0515   Epoch: 5   Global Step: 70140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:20:14,009-Speed 3056.21 samples/sec   Loss 10.6206   LearningRate 0.0515   Epoch: 5   Global Step: 70150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:20:17,421-Speed 3001.45 samples/sec   Loss 10.4796   LearningRate 0.0515   Epoch: 5   Global Step: 70160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:20:20,834-Speed 3000.86 samples/sec   Loss 10.5846   LearningRate 0.0515   Epoch: 5   Global Step: 70170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:20:24,199-Speed 3044.21 samples/sec   Loss 10.4560   LearningRate 0.0515   Epoch: 5   Global Step: 70180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:20:27,598-Speed 3013.66 samples/sec   Loss 10.5127   LearningRate 0.0515   Epoch: 5   Global Step: 70190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:20:31,010-Speed 3001.68 samples/sec   Loss 10.5663   LearningRate 0.0515   Epoch: 5   Global Step: 70200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:20:34,424-Speed 3000.67 samples/sec   Loss 10.5004   LearningRate 0.0515   Epoch: 5   Global Step: 70210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:20:37,790-Speed 3043.01 samples/sec   Loss 10.5019   LearningRate 0.0515   Epoch: 5   Global Step: 70220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:20:41,181-Speed 3020.31 samples/sec   Loss 10.5046   LearningRate 0.0515   Epoch: 5   Global Step: 70230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:20:44,550-Speed 3040.54 samples/sec   Loss 10.5327   LearningRate 0.0514   Epoch: 5   Global Step: 70240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:20:47,937-Speed 3023.62 samples/sec   Loss 10.5039   LearningRate 0.0514   Epoch: 5   Global Step: 70250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:20:51,331-Speed 3018.29 samples/sec   Loss 10.6547   LearningRate 0.0514   Epoch: 5   Global Step: 70260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:20:54,806-Speed 2947.82 samples/sec   Loss 10.6417   LearningRate 0.0514   Epoch: 5   Global Step: 70270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:20:58,170-Speed 3044.69 samples/sec   Loss 10.6402   LearningRate 0.0514   Epoch: 5   Global Step: 70280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:21:01,575-Speed 3008.23 samples/sec   Loss 10.4684   LearningRate 0.0514   Epoch: 5   Global Step: 70290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:21:05,001-Speed 2989.92 samples/sec   Loss 10.4927   LearningRate 0.0514   Epoch: 5   Global Step: 70300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:21:08,444-Speed 2974.91 samples/sec   Loss 10.4119   LearningRate 0.0514   Epoch: 5   Global Step: 70310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:21:11,852-Speed 3005.21 samples/sec   Loss 10.4856   LearningRate 0.0514   Epoch: 5   Global Step: 70320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:21:15,340-Speed 2937.12 samples/sec   Loss 10.5032   LearningRate 0.0514   Epoch: 5   Global Step: 70330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:21:18,822-Speed 2941.23 samples/sec   Loss 10.5749   LearningRate 0.0514   Epoch: 5   Global Step: 70340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:21:22,172-Speed 3057.70 samples/sec   Loss 10.3565   LearningRate 0.0514   Epoch: 5   Global Step: 70350   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:21:25,624-Speed 2967.10 samples/sec   Loss 10.6013   LearningRate 0.0514   Epoch: 5   Global Step: 70360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:21:28,976-Speed 3056.95 samples/sec   Loss 10.6555   LearningRate 0.0514   Epoch: 5   Global Step: 70370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:21:32,480-Speed 2922.94 samples/sec   Loss 10.6889   LearningRate 0.0514   Epoch: 5   Global Step: 70380   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:21:35,888-Speed 3005.56 samples/sec   Loss 10.4752   LearningRate 0.0514   Epoch: 5   Global Step: 70390   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:21:39,224-Speed 3071.06 samples/sec   Loss 10.5276   LearningRate 0.0514   Epoch: 5   Global Step: 70400   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:21:42,602-Speed 3032.05 samples/sec   Loss 10.5899   LearningRate 0.0513   Epoch: 5   Global Step: 70410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:21:46,003-Speed 3011.97 samples/sec   Loss 10.5274   LearningRate 0.0513   Epoch: 5   Global Step: 70420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:21:49,416-Speed 3000.79 samples/sec   Loss 10.6030   LearningRate 0.0513   Epoch: 5   Global Step: 70430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:21:52,806-Speed 3021.97 samples/sec   Loss 10.5811   LearningRate 0.0513   Epoch: 5   Global Step: 70440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:21:56,135-Speed 3076.88 samples/sec   Loss 10.5085   LearningRate 0.0513   Epoch: 5   Global Step: 70450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:21:59,506-Speed 3038.27 samples/sec   Loss 10.5181   LearningRate 0.0513   Epoch: 5   Global Step: 70460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:22:02,881-Speed 3035.39 samples/sec   Loss 10.6635   LearningRate 0.0513   Epoch: 5   Global Step: 70470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:22:06,217-Speed 3069.98 samples/sec   Loss 10.5055   LearningRate 0.0513   Epoch: 5   Global Step: 70480   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:22:09,642-Speed 2990.67 samples/sec   Loss 10.5972   LearningRate 0.0513   Epoch: 5   Global Step: 70490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:22:13,069-Speed 2988.99 samples/sec   Loss 10.4974   LearningRate 0.0513   Epoch: 5   Global Step: 70500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:22:16,459-Speed 3021.37 samples/sec   Loss 10.5343   LearningRate 0.0513   Epoch: 5   Global Step: 70510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:22:19,815-Speed 3052.40 samples/sec   Loss 10.5969   LearningRate 0.0513   Epoch: 5   Global Step: 70520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:22:23,162-Speed 3060.35 samples/sec   Loss 10.4654   LearningRate 0.0513   Epoch: 5   Global Step: 70530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:22:26,586-Speed 2991.39 samples/sec   Loss 10.4582   LearningRate 0.0513   Epoch: 5   Global Step: 70540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:22:29,948-Speed 3046.93 samples/sec   Loss 10.5752   LearningRate 0.0513   Epoch: 5   Global Step: 70550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:22:33,298-Speed 3058.06 samples/sec   Loss 10.6620   LearningRate 0.0513   Epoch: 5   Global Step: 70560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:22:36,751-Speed 2966.41 samples/sec   Loss 10.5088   LearningRate 0.0513   Epoch: 5   Global Step: 70570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:22:40,177-Speed 2990.16 samples/sec   Loss 10.6071   LearningRate 0.0512   Epoch: 5   Global Step: 70580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:22:43,595-Speed 2996.48 samples/sec   Loss 10.6257   LearningRate 0.0512   Epoch: 5   Global Step: 70590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:22:47,069-Speed 2948.64 samples/sec   Loss 10.5262   LearningRate 0.0512   Epoch: 5   Global Step: 70600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:22:50,502-Speed 2982.96 samples/sec   Loss 10.3705   LearningRate 0.0512   Epoch: 5   Global Step: 70610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:22:53,931-Speed 2987.53 samples/sec   Loss 10.4505   LearningRate 0.0512   Epoch: 5   Global Step: 70620   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:22:57,360-Speed 2986.92 samples/sec   Loss 10.5163   LearningRate 0.0512   Epoch: 5   Global Step: 70630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:23:00,728-Speed 3041.35 samples/sec   Loss 10.4752   LearningRate 0.0512   Epoch: 5   Global Step: 70640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:23:04,066-Speed 3068.70 samples/sec   Loss 10.5961   LearningRate 0.0512   Epoch: 5   Global Step: 70650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:07,472-Speed 3007.32 samples/sec   Loss 10.2734   LearningRate 0.0512   Epoch: 5   Global Step: 70660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:10,794-Speed 3083.98 samples/sec   Loss 10.4093   LearningRate 0.0512   Epoch: 5   Global Step: 70670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:14,218-Speed 2990.89 samples/sec   Loss 10.5455   LearningRate 0.0512   Epoch: 5   Global Step: 70680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:17,547-Speed 3077.03 samples/sec   Loss 10.4759   LearningRate 0.0512   Epoch: 5   Global Step: 70690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:20,878-Speed 3075.00 samples/sec   Loss 10.4189   LearningRate 0.0512   Epoch: 5   Global Step: 70700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:24,262-Speed 3027.12 samples/sec   Loss 10.4982   LearningRate 0.0512   Epoch: 5   Global Step: 70710   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:27,720-Speed 2962.41 samples/sec   Loss 10.5842   LearningRate 0.0512   Epoch: 5   Global Step: 70720   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:31,163-Speed 2974.39 samples/sec   Loss 10.5386   LearningRate 0.0512   Epoch: 5   Global Step: 70730   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:34,611-Speed 2972.36 samples/sec   Loss 10.5009   LearningRate 0.0512   Epoch: 5   Global Step: 70740   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:37,979-Speed 3041.18 samples/sec   Loss 10.3940   LearningRate 0.0512   Epoch: 5   Global Step: 70750   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:41,420-Speed 2976.72 samples/sec   Loss 10.3894   LearningRate 0.0511   Epoch: 5   Global Step: 70760   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:44,858-Speed 2979.31 samples/sec   Loss 10.5352   LearningRate 0.0511   Epoch: 5   Global Step: 70770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:48,252-Speed 3017.79 samples/sec   Loss 10.5488   LearningRate 0.0511   Epoch: 5   Global Step: 70780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:51,592-Speed 3067.11 samples/sec   Loss 10.3773   LearningRate 0.0511   Epoch: 5   Global Step: 70790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:54,923-Speed 3075.37 samples/sec   Loss 10.3292   LearningRate 0.0511   Epoch: 5   Global Step: 70800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:23:58,251-Speed 3077.53 samples/sec   Loss 10.6228   LearningRate 0.0511   Epoch: 5   Global Step: 70810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:24:01,627-Speed 3034.53 samples/sec   Loss 10.4681   LearningRate 0.0511   Epoch: 5   Global Step: 70820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:24:05,044-Speed 2997.12 samples/sec   Loss 10.4815   LearningRate 0.0511   Epoch: 5   Global Step: 70830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:24:08,417-Speed 3036.77 samples/sec   Loss 10.4859   LearningRate 0.0511   Epoch: 5   Global Step: 70840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:24:11,749-Speed 3074.39 samples/sec   Loss 10.4520   LearningRate 0.0511   Epoch: 5   Global Step: 70850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:24:15,172-Speed 2993.08 samples/sec   Loss 10.5323   LearningRate 0.0511   Epoch: 5   Global Step: 70860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:24:18,529-Speed 3050.76 samples/sec   Loss 10.4398   LearningRate 0.0511   Epoch: 5   Global Step: 70870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:24:21,915-Speed 3024.67 samples/sec   Loss 10.4723   LearningRate 0.0511   Epoch: 5   Global Step: 70880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:24:25,379-Speed 2957.30 samples/sec   Loss 10.3338   LearningRate 0.0511   Epoch: 5   Global Step: 70890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:24:28,773-Speed 3017.51 samples/sec   Loss 10.5076   LearningRate 0.0511   Epoch: 5   Global Step: 70900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:24:32,184-Speed 3003.06 samples/sec   Loss 10.4001   LearningRate 0.0511   Epoch: 5   Global Step: 70910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:24:35,632-Speed 2970.58 samples/sec   Loss 10.4811   LearningRate 0.0511   Epoch: 5   Global Step: 70920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:24:39,105-Speed 2949.74 samples/sec   Loss 10.3010   LearningRate 0.0510   Epoch: 5   Global Step: 70930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:24:42,498-Speed 3019.43 samples/sec   Loss 10.4153   LearningRate 0.0510   Epoch: 5   Global Step: 70940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:24:45,855-Speed 3050.62 samples/sec   Loss 10.6066   LearningRate 0.0510   Epoch: 5   Global Step: 70950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:24:49,241-Speed 3025.38 samples/sec   Loss 10.5711   LearningRate 0.0510   Epoch: 5   Global Step: 70960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:24:52,634-Speed 3018.93 samples/sec   Loss 10.4892   LearningRate 0.0510   Epoch: 5   Global Step: 70970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:24:56,051-Speed 2997.44 samples/sec   Loss 10.6682   LearningRate 0.0510   Epoch: 5   Global Step: 70980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:24:59,465-Speed 3000.00 samples/sec   Loss 10.4887   LearningRate 0.0510   Epoch: 5   Global Step: 70990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:25:02,839-Speed 3036.21 samples/sec   Loss 10.4588   LearningRate 0.0510   Epoch: 5   Global Step: 71000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:25:06,363-Speed 2906.97 samples/sec   Loss 10.5007   LearningRate 0.0510   Epoch: 5   Global Step: 71010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:25:09,756-Speed 3018.63 samples/sec   Loss 10.4127   LearningRate 0.0510   Epoch: 5   Global Step: 71020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:25:13,166-Speed 3003.79 samples/sec   Loss 10.4959   LearningRate 0.0510   Epoch: 5   Global Step: 71030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:25:16,558-Speed 3019.54 samples/sec   Loss 10.4928   LearningRate 0.0510   Epoch: 5   Global Step: 71040   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:25:20,019-Speed 2960.10 samples/sec   Loss 10.4659   LearningRate 0.0510   Epoch: 5   Global Step: 71050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:25:23,436-Speed 2997.21 samples/sec   Loss 10.5250   LearningRate 0.0510   Epoch: 5   Global Step: 71060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:25:26,854-Speed 2996.87 samples/sec   Loss 10.5204   LearningRate 0.0510   Epoch: 5   Global Step: 71070   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:25:30,239-Speed 3025.87 samples/sec   Loss 10.5804   LearningRate 0.0510   Epoch: 5   Global Step: 71080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:25:33,702-Speed 2958.30 samples/sec   Loss 10.4290   LearningRate 0.0510   Epoch: 5   Global Step: 71090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:25:37,108-Speed 3007.69 samples/sec   Loss 10.4970   LearningRate 0.0509   Epoch: 5   Global Step: 71100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:25:40,591-Speed 2940.73 samples/sec   Loss 10.4373   LearningRate 0.0509   Epoch: 5   Global Step: 71110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:25:44,080-Speed 2935.75 samples/sec   Loss 10.5351   LearningRate 0.0509   Epoch: 5   Global Step: 71120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:25:47,568-Speed 2936.57 samples/sec   Loss 10.5264   LearningRate 0.0509   Epoch: 5   Global Step: 71130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:25:51,016-Speed 2971.19 samples/sec   Loss 10.4332   LearningRate 0.0509   Epoch: 5   Global Step: 71140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:25:54,451-Speed 2982.48 samples/sec   Loss 10.6344   LearningRate 0.0509   Epoch: 5   Global Step: 71150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:25:57,806-Speed 3052.78 samples/sec   Loss 10.4920   LearningRate 0.0509   Epoch: 5   Global Step: 71160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:26:01,166-Speed 3048.24 samples/sec   Loss 10.5846   LearningRate 0.0509   Epoch: 5   Global Step: 71170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:26:04,509-Speed 3064.56 samples/sec   Loss 10.4241   LearningRate 0.0509   Epoch: 5   Global Step: 71180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:26:07,895-Speed 3024.48 samples/sec   Loss 10.4300   LearningRate 0.0509   Epoch: 5   Global Step: 71190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:26:11,243-Speed 3059.94 samples/sec   Loss 10.4073   LearningRate 0.0509   Epoch: 5   Global Step: 71200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:26:14,630-Speed 3023.52 samples/sec   Loss 10.5257   LearningRate 0.0509   Epoch: 5   Global Step: 71210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:26:18,070-Speed 2978.40 samples/sec   Loss 10.4744   LearningRate 0.0509   Epoch: 5   Global Step: 71220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:26:21,534-Speed 2956.53 samples/sec   Loss 10.4745   LearningRate 0.0509   Epoch: 5   Global Step: 71230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:26:24,851-Speed 3088.17 samples/sec   Loss 10.4919   LearningRate 0.0509   Epoch: 5   Global Step: 71240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:26:28,217-Speed 3042.76 samples/sec   Loss 10.5472   LearningRate 0.0509   Epoch: 5   Global Step: 71250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:26:31,543-Speed 3080.01 samples/sec   Loss 10.6030   LearningRate 0.0509   Epoch: 5   Global Step: 71260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:26:34,957-Speed 2999.61 samples/sec   Loss 10.4458   LearningRate 0.0509   Epoch: 5   Global Step: 71270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:26:38,411-Speed 2966.28 samples/sec   Loss 10.4434   LearningRate 0.0508   Epoch: 5   Global Step: 71280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:26:41,831-Speed 2994.93 samples/sec   Loss 10.3994   LearningRate 0.0508   Epoch: 5   Global Step: 71290   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:26:45,270-Speed 2978.16 samples/sec   Loss 10.5918   LearningRate 0.0508   Epoch: 5   Global Step: 71300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:26:48,743-Speed 2949.26 samples/sec   Loss 10.4094   LearningRate 0.0508   Epoch: 5   Global Step: 71310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:26:52,090-Speed 3060.36 samples/sec   Loss 10.5391   LearningRate 0.0508   Epoch: 5   Global Step: 71320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:26:55,496-Speed 3007.21 samples/sec   Loss 10.5028   LearningRate 0.0508   Epoch: 5   Global Step: 71330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:26:58,900-Speed 3009.30 samples/sec   Loss 10.6240   LearningRate 0.0508   Epoch: 5   Global Step: 71340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:27:02,306-Speed 3007.06 samples/sec   Loss 10.4756   LearningRate 0.0508   Epoch: 5   Global Step: 71350   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:27:05,680-Speed 3037.00 samples/sec   Loss 10.4337   LearningRate 0.0508   Epoch: 5   Global Step: 71360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:27:09,178-Speed 2928.76 samples/sec   Loss 10.3458   LearningRate 0.0508   Epoch: 5   Global Step: 71370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:27:12,544-Speed 3042.64 samples/sec   Loss 10.4838   LearningRate 0.0508   Epoch: 5   Global Step: 71380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:27:15,948-Speed 3009.21 samples/sec   Loss 10.4382   LearningRate 0.0508   Epoch: 5   Global Step: 71390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:27:19,301-Speed 3055.77 samples/sec   Loss 10.4948   LearningRate 0.0508   Epoch: 5   Global Step: 71400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:27:22,789-Speed 2936.59 samples/sec   Loss 10.3554   LearningRate 0.0508   Epoch: 5   Global Step: 71410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:27:26,143-Speed 3053.72 samples/sec   Loss 10.5778   LearningRate 0.0508   Epoch: 5   Global Step: 71420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:27:29,552-Speed 3004.05 samples/sec   Loss 10.3816   LearningRate 0.0508   Epoch: 5   Global Step: 71430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:27:32,895-Speed 3064.56 samples/sec   Loss 10.4523   LearningRate 0.0508   Epoch: 5   Global Step: 71440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:27:36,269-Speed 3035.25 samples/sec   Loss 10.5072   LearningRate 0.0507   Epoch: 5   Global Step: 71450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:27:39,678-Speed 3005.04 samples/sec   Loss 10.4357   LearningRate 0.0507   Epoch: 5   Global Step: 71460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:27:43,059-Speed 3029.48 samples/sec   Loss 10.5619   LearningRate 0.0507   Epoch: 5   Global Step: 71470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:27:46,421-Speed 3046.72 samples/sec   Loss 10.3712   LearningRate 0.0507   Epoch: 5   Global Step: 71480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:27:49,781-Speed 3048.30 samples/sec   Loss 10.4065   LearningRate 0.0507   Epoch: 5   Global Step: 71490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:27:53,245-Speed 2957.44 samples/sec   Loss 10.2923   LearningRate 0.0507   Epoch: 5   Global Step: 71500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:27:56,695-Speed 2968.38 samples/sec   Loss 10.4433   LearningRate 0.0507   Epoch: 5   Global Step: 71510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:28:00,029-Speed 3072.35 samples/sec   Loss 10.5186   LearningRate 0.0507   Epoch: 5   Global Step: 71520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:28:03,383-Speed 3054.73 samples/sec   Loss 10.5153   LearningRate 0.0507   Epoch: 5   Global Step: 71530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:28:06,771-Speed 3023.31 samples/sec   Loss 10.4535   LearningRate 0.0507   Epoch: 5   Global Step: 71540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:28:10,197-Speed 2989.64 samples/sec   Loss 10.4274   LearningRate 0.0507   Epoch: 5   Global Step: 71550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:28:13,662-Speed 2955.90 samples/sec   Loss 10.4838   LearningRate 0.0507   Epoch: 5   Global Step: 71560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:28:17,048-Speed 3025.46 samples/sec   Loss 10.4260   LearningRate 0.0507   Epoch: 5   Global Step: 71570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:28:20,508-Speed 2959.81 samples/sec   Loss 10.4768   LearningRate 0.0507   Epoch: 5   Global Step: 71580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:28:23,944-Speed 2981.39 samples/sec   Loss 10.5295   LearningRate 0.0507   Epoch: 5   Global Step: 71590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:28:27,290-Speed 3061.00 samples/sec   Loss 10.4264   LearningRate 0.0507   Epoch: 5   Global Step: 71600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:28:30,618-Speed 3077.78 samples/sec   Loss 10.3983   LearningRate 0.0507   Epoch: 5   Global Step: 71610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:28:33,965-Speed 3062.32 samples/sec   Loss 10.3368   LearningRate 0.0507   Epoch: 5   Global Step: 71620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:28:37,373-Speed 3005.66 samples/sec   Loss 10.4078   LearningRate 0.0506   Epoch: 5   Global Step: 71630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:28:40,811-Speed 2978.60 samples/sec   Loss 10.3488   LearningRate 0.0506   Epoch: 5   Global Step: 71640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:28:44,278-Speed 2955.28 samples/sec   Loss 10.6279   LearningRate 0.0506   Epoch: 5   Global Step: 71650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:28:47,731-Speed 2966.28 samples/sec   Loss 10.3624   LearningRate 0.0506   Epoch: 5   Global Step: 71660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:28:51,123-Speed 3019.57 samples/sec   Loss 10.4145   LearningRate 0.0506   Epoch: 5   Global Step: 71670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:28:54,520-Speed 3015.56 samples/sec   Loss 10.3859   LearningRate 0.0506   Epoch: 5   Global Step: 71680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:28:57,947-Speed 2988.38 samples/sec   Loss 10.5627   LearningRate 0.0506   Epoch: 5   Global Step: 71690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:01,369-Speed 2993.68 samples/sec   Loss 10.4602   LearningRate 0.0506   Epoch: 5   Global Step: 71700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:04,771-Speed 3011.11 samples/sec   Loss 10.4124   LearningRate 0.0506   Epoch: 5   Global Step: 71710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:08,161-Speed 3021.39 samples/sec   Loss 10.4194   LearningRate 0.0506   Epoch: 5   Global Step: 71720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:11,618-Speed 2962.52 samples/sec   Loss 10.2698   LearningRate 0.0506   Epoch: 5   Global Step: 71730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:15,002-Speed 3026.84 samples/sec   Loss 10.3914   LearningRate 0.0506   Epoch: 5   Global Step: 71740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:18,409-Speed 3006.38 samples/sec   Loss 10.3225   LearningRate 0.0506   Epoch: 5   Global Step: 71750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:21,781-Speed 3037.93 samples/sec   Loss 10.4794   LearningRate 0.0506   Epoch: 5   Global Step: 71760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:25,208-Speed 2989.28 samples/sec   Loss 10.4131   LearningRate 0.0506   Epoch: 5   Global Step: 71770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:28,553-Speed 3061.73 samples/sec   Loss 10.3624   LearningRate 0.0506   Epoch: 5   Global Step: 71780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:29:31,880-Speed 3079.06 samples/sec   Loss 10.5639   LearningRate 0.0506   Epoch: 5   Global Step: 71790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:35,251-Speed 3038.64 samples/sec   Loss 10.2158   LearningRate 0.0505   Epoch: 5   Global Step: 71800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:38,599-Speed 3058.67 samples/sec   Loss 10.3229   LearningRate 0.0505   Epoch: 5   Global Step: 71810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:41,955-Speed 3052.24 samples/sec   Loss 10.4455   LearningRate 0.0505   Epoch: 5   Global Step: 71820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:45,355-Speed 3012.86 samples/sec   Loss 10.4014   LearningRate 0.0505   Epoch: 5   Global Step: 71830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:48,729-Speed 3036.01 samples/sec   Loss 10.4037   LearningRate 0.0505   Epoch: 5   Global Step: 71840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:52,128-Speed 3013.28 samples/sec   Loss 10.4106   LearningRate 0.0505   Epoch: 5   Global Step: 71850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:55,504-Speed 3034.09 samples/sec   Loss 10.4279   LearningRate 0.0505   Epoch: 5   Global Step: 71860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:29:58,933-Speed 2987.29 samples/sec   Loss 10.3890   LearningRate 0.0505   Epoch: 5   Global Step: 71870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:30:02,366-Speed 2983.39 samples/sec   Loss 10.3334   LearningRate 0.0505   Epoch: 5   Global Step: 71880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:30:05,743-Speed 3033.43 samples/sec   Loss 10.4499   LearningRate 0.0505   Epoch: 5   Global Step: 71890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:30:09,121-Speed 3032.35 samples/sec   Loss 10.4807   LearningRate 0.0505   Epoch: 5   Global Step: 71900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:30:12,539-Speed 2997.27 samples/sec   Loss 10.4394   LearningRate 0.0505   Epoch: 5   Global Step: 71910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:30:16,030-Speed 2933.86 samples/sec   Loss 10.3688   LearningRate 0.0505   Epoch: 5   Global Step: 71920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:30:19,453-Speed 2992.81 samples/sec   Loss 10.4618   LearningRate 0.0505   Epoch: 5   Global Step: 71930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:30:22,848-Speed 3016.89 samples/sec   Loss 10.3946   LearningRate 0.0505   Epoch: 5   Global Step: 71940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:30:26,159-Speed 3093.86 samples/sec   Loss 10.4964   LearningRate 0.0505   Epoch: 5   Global Step: 71950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:30:29,540-Speed 3029.95 samples/sec   Loss 10.3406   LearningRate 0.0505   Epoch: 5   Global Step: 71960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:30:32,919-Speed 3030.90 samples/sec   Loss 10.3740   LearningRate 0.0505   Epoch: 5   Global Step: 71970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:30:36,322-Speed 3009.82 samples/sec   Loss 10.2919   LearningRate 0.0504   Epoch: 5   Global Step: 71980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:30:39,745-Speed 2992.71 samples/sec   Loss 10.4710   LearningRate 0.0504   Epoch: 5   Global Step: 71990   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:30:43,129-Speed 3026.91 samples/sec   Loss 10.4038   LearningRate 0.0504   Epoch: 5   Global Step: 72000   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:30:46,518-Speed 3022.82 samples/sec   Loss 10.5169   LearningRate 0.0504   Epoch: 5   Global Step: 72010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:30:49,947-Speed 2987.09 samples/sec   Loss 10.5074   LearningRate 0.0504   Epoch: 5   Global Step: 72020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:30:53,366-Speed 2995.63 samples/sec   Loss 10.5658   LearningRate 0.0504   Epoch: 5   Global Step: 72030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:30:56,744-Speed 3032.77 samples/sec   Loss 10.4648   LearningRate 0.0504   Epoch: 5   Global Step: 72040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:31:00,172-Speed 2987.78 samples/sec   Loss 10.4619   LearningRate 0.0504   Epoch: 5   Global Step: 72050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:31:03,554-Speed 3028.84 samples/sec   Loss 10.5254   LearningRate 0.0504   Epoch: 5   Global Step: 72060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:31:06,886-Speed 3074.50 samples/sec   Loss 10.4253   LearningRate 0.0504   Epoch: 5   Global Step: 72070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:31:10,205-Speed 3085.77 samples/sec   Loss 10.3156   LearningRate 0.0504   Epoch: 5   Global Step: 72080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:31:13,667-Speed 2958.88 samples/sec   Loss 10.5082   LearningRate 0.0504   Epoch: 5   Global Step: 72090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:31:17,050-Speed 3027.85 samples/sec   Loss 10.2629   LearningRate 0.0504   Epoch: 5   Global Step: 72100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:31:20,448-Speed 3015.06 samples/sec   Loss 10.3715   LearningRate 0.0504   Epoch: 5   Global Step: 72110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:31:23,837-Speed 3022.38 samples/sec   Loss 10.3280   LearningRate 0.0504   Epoch: 5   Global Step: 72120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:31:27,234-Speed 3014.88 samples/sec   Loss 10.3258   LearningRate 0.0504   Epoch: 5   Global Step: 72130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:31:30,576-Speed 3064.78 samples/sec   Loss 10.3709   LearningRate 0.0504   Epoch: 5   Global Step: 72140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:31:33,979-Speed 3010.14 samples/sec   Loss 10.5082   LearningRate 0.0503   Epoch: 5   Global Step: 72150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:31:37,403-Speed 2991.76 samples/sec   Loss 10.4847   LearningRate 0.0503   Epoch: 5   Global Step: 72160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:31:40,819-Speed 2998.21 samples/sec   Loss 10.4222   LearningRate 0.0503   Epoch: 5   Global Step: 72170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:31:45,033-Speed 2430.31 samples/sec   Loss 10.3397   LearningRate 0.0503   Epoch: 5   Global Step: 72180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:31:48,442-Speed 3005.62 samples/sec   Loss 10.4122   LearningRate 0.0503   Epoch: 5   Global Step: 72190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:31:53,121-Speed 2188.88 samples/sec   Loss 10.3602   LearningRate 0.0503   Epoch: 5   Global Step: 72200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:31:57,129-Speed 2555.73 samples/sec   Loss 10.4978   LearningRate 0.0503   Epoch: 5   Global Step: 72210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:32:00,479-Speed 3058.21 samples/sec   Loss 10.3418   LearningRate 0.0503   Epoch: 5   Global Step: 72220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:32:04,590-Speed 2491.36 samples/sec   Loss 10.4354   LearningRate 0.0503   Epoch: 5   Global Step: 72230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:32:07,978-Speed 3023.06 samples/sec   Loss 10.3518   LearningRate 0.0503   Epoch: 5   Global Step: 72240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:32:11,401-Speed 2991.98 samples/sec   Loss 10.4671   LearningRate 0.0503   Epoch: 5   Global Step: 72250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 08:32:14,748-Speed 3060.88 samples/sec   Loss 10.2545   LearningRate 0.0503   Epoch: 5   Global Step: 72260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:32:18,154-Speed 3007.28 samples/sec   Loss 10.4326   LearningRate 0.0503   Epoch: 5   Global Step: 72270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:32:21,530-Speed 3034.61 samples/sec   Loss 10.4205   LearningRate 0.0503   Epoch: 5   Global Step: 72280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:32:24,959-Speed 2987.15 samples/sec   Loss 10.5108   LearningRate 0.0503   Epoch: 5   Global Step: 72290   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:32:28,358-Speed 3013.18 samples/sec   Loss 10.2528   LearningRate 0.0503   Epoch: 5   Global Step: 72300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 08:32:31,759-Speed 3011.89 samples/sec   Loss 10.3837   LearningRate 0.0503   Epoch: 5   Global Step: 72310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:32:35,131-Speed 3037.38 samples/sec   Loss 10.3827   LearningRate 0.0503   Epoch: 5   Global Step: 72320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:32:38,546-Speed 3000.18 samples/sec   Loss 10.3095   LearningRate 0.0502   Epoch: 5   Global Step: 72330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 08:32:41,940-Speed 3017.80 samples/sec   Loss 10.3826   LearningRate 0.0502   Epoch: 5   Global Step: 72340   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:32:45,303-Speed 3045.09 samples/sec   Loss 10.2963   LearningRate 0.0502   Epoch: 5   Global Step: 72350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:32:48,694-Speed 3021.16 samples/sec   Loss 10.3069   LearningRate 0.0502   Epoch: 5   Global Step: 72360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:32:52,063-Speed 3040.04 samples/sec   Loss 10.5349   LearningRate 0.0502   Epoch: 5   Global Step: 72370   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:32:55,408-Speed 3062.80 samples/sec   Loss 10.4027   LearningRate 0.0502   Epoch: 5   Global Step: 72380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:32:58,770-Speed 3046.85 samples/sec   Loss 10.3717   LearningRate 0.0502   Epoch: 5   Global Step: 72390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:33:02,211-Speed 2976.27 samples/sec   Loss 10.4102   LearningRate 0.0502   Epoch: 5   Global Step: 72400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:33:05,590-Speed 3031.32 samples/sec   Loss 10.4473   LearningRate 0.0502   Epoch: 5   Global Step: 72410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:33:08,921-Speed 3074.58 samples/sec   Loss 10.3473   LearningRate 0.0502   Epoch: 5   Global Step: 72420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:33:12,351-Speed 2986.76 samples/sec   Loss 10.3422   LearningRate 0.0502   Epoch: 5   Global Step: 72430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:33:15,775-Speed 2992.68 samples/sec   Loss 10.3371   LearningRate 0.0502   Epoch: 5   Global Step: 72440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:33:19,138-Speed 3046.13 samples/sec   Loss 10.5771   LearningRate 0.0502   Epoch: 5   Global Step: 72450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:33:22,461-Speed 3081.70 samples/sec   Loss 10.3407   LearningRate 0.0502   Epoch: 5   Global Step: 72460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:33:25,846-Speed 3026.20 samples/sec   Loss 10.4264   LearningRate 0.0502   Epoch: 5   Global Step: 72470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:33:29,214-Speed 3041.38 samples/sec   Loss 10.3647   LearningRate 0.0502   Epoch: 5   Global Step: 72480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:33:32,579-Speed 3043.93 samples/sec   Loss 10.4161   LearningRate 0.0502   Epoch: 5   Global Step: 72490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:33:35,912-Speed 3073.49 samples/sec   Loss 10.4541   LearningRate 0.0501   Epoch: 5   Global Step: 72500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:33:39,302-Speed 3021.73 samples/sec   Loss 10.3491   LearningRate 0.0501   Epoch: 5   Global Step: 72510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:33:42,622-Speed 3085.23 samples/sec   Loss 10.3886   LearningRate 0.0501   Epoch: 5   Global Step: 72520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:33:46,068-Speed 2972.36 samples/sec   Loss 10.3699   LearningRate 0.0501   Epoch: 5   Global Step: 72530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:33:49,460-Speed 3019.77 samples/sec   Loss 10.3208   LearningRate 0.0501   Epoch: 5   Global Step: 72540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:33:52,842-Speed 3029.09 samples/sec   Loss 10.4145   LearningRate 0.0501   Epoch: 5   Global Step: 72550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:33:56,208-Speed 3043.42 samples/sec   Loss 10.2853   LearningRate 0.0501   Epoch: 5   Global Step: 72560   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:33:59,573-Speed 3044.21 samples/sec   Loss 10.2592   LearningRate 0.0501   Epoch: 5   Global Step: 72570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:34:04,211-Speed 2208.13 samples/sec   Loss 10.2738   LearningRate 0.0501   Epoch: 5   Global Step: 72580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:34:07,648-Speed 2980.70 samples/sec   Loss 10.3501   LearningRate 0.0501   Epoch: 5   Global Step: 72590   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:34:11,060-Speed 3001.63 samples/sec   Loss 10.4587   LearningRate 0.0501   Epoch: 5   Global Step: 72600   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:34:14,445-Speed 3026.71 samples/sec   Loss 10.3599   LearningRate 0.0501   Epoch: 5   Global Step: 72610   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:34:17,850-Speed 3008.00 samples/sec   Loss 10.2808   LearningRate 0.0501   Epoch: 5   Global Step: 72620   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:34:21,260-Speed 3004.44 samples/sec   Loss 10.4352   LearningRate 0.0501   Epoch: 5   Global Step: 72630   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:34:24,585-Speed 3080.22 samples/sec   Loss 10.4621   LearningRate 0.0501   Epoch: 5   Global Step: 72640   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:34:27,971-Speed 3024.77 samples/sec   Loss 10.3826   LearningRate 0.0501   Epoch: 5   Global Step: 72650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:34:31,290-Speed 3086.95 samples/sec   Loss 10.4165   LearningRate 0.0501   Epoch: 5   Global Step: 72660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:34:34,588-Speed 3104.86 samples/sec   Loss 10.2374   LearningRate 0.0501   Epoch: 5   Global Step: 72670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:34:37,915-Speed 3078.99 samples/sec   Loss 10.3428   LearningRate 0.0500   Epoch: 5   Global Step: 72680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:34:41,337-Speed 2993.67 samples/sec   Loss 10.3632   LearningRate 0.0500   Epoch: 5   Global Step: 72690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:34:44,666-Speed 3077.74 samples/sec   Loss 10.4557   LearningRate 0.0500   Epoch: 5   Global Step: 72700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:34:48,042-Speed 3034.01 samples/sec   Loss 10.5162   LearningRate 0.0500   Epoch: 5   Global Step: 72710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:34:51,529-Speed 2937.71 samples/sec   Loss 10.4015   LearningRate 0.0500   Epoch: 5   Global Step: 72720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:34:54,964-Speed 2981.78 samples/sec   Loss 10.4715   LearningRate 0.0500   Epoch: 5   Global Step: 72730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:34:58,350-Speed 3025.36 samples/sec   Loss 10.3654   LearningRate 0.0500   Epoch: 5   Global Step: 72740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:35:01,763-Speed 3000.83 samples/sec   Loss 10.4574   LearningRate 0.0500   Epoch: 5   Global Step: 72750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:35:05,114-Speed 3056.83 samples/sec   Loss 10.4467   LearningRate 0.0500   Epoch: 5   Global Step: 72760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:35:08,524-Speed 3003.73 samples/sec   Loss 10.4942   LearningRate 0.0500   Epoch: 5   Global Step: 72770   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:35:11,877-Speed 3055.17 samples/sec   Loss 10.2488   LearningRate 0.0500   Epoch: 5   Global Step: 72780   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:35:15,200-Speed 3082.55 samples/sec   Loss 10.3627   LearningRate 0.0500   Epoch: 5   Global Step: 72790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:35:18,638-Speed 2979.01 samples/sec   Loss 10.3235   LearningRate 0.0500   Epoch: 5   Global Step: 72800   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:35:22,077-Speed 2978.79 samples/sec   Loss 10.2140   LearningRate 0.0500   Epoch: 5   Global Step: 72810   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:35:25,443-Speed 3042.73 samples/sec   Loss 10.2631   LearningRate 0.0500   Epoch: 5   Global Step: 72820   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:35:28,926-Speed 2940.80 samples/sec   Loss 10.3067   LearningRate 0.0500   Epoch: 5   Global Step: 72830   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:35:32,411-Speed 2939.14 samples/sec   Loss 10.3584   LearningRate 0.0500   Epoch: 5   Global Step: 72840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:35:35,832-Speed 2994.48 samples/sec   Loss 10.4718   LearningRate 0.0499   Epoch: 5   Global Step: 72850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:35:39,232-Speed 3012.77 samples/sec   Loss 10.3875   LearningRate 0.0499   Epoch: 5   Global Step: 72860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:35:42,617-Speed 3025.79 samples/sec   Loss 10.3112   LearningRate 0.0499   Epoch: 5   Global Step: 72870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:35:46,053-Speed 2981.17 samples/sec   Loss 10.2045   LearningRate 0.0499   Epoch: 5   Global Step: 72880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:35:49,532-Speed 2944.30 samples/sec   Loss 10.3680   LearningRate 0.0499   Epoch: 5   Global Step: 72890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:35:52,910-Speed 3031.87 samples/sec   Loss 10.4006   LearningRate 0.0499   Epoch: 5   Global Step: 72900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:35:56,288-Speed 3032.81 samples/sec   Loss 10.2676   LearningRate 0.0499   Epoch: 5   Global Step: 72910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:35:59,641-Speed 3054.10 samples/sec   Loss 10.1740   LearningRate 0.0499   Epoch: 5   Global Step: 72920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:02,992-Speed 3057.45 samples/sec   Loss 10.3516   LearningRate 0.0499   Epoch: 5   Global Step: 72930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:06,362-Speed 3039.32 samples/sec   Loss 10.3908   LearningRate 0.0499   Epoch: 5   Global Step: 72940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:36:09,750-Speed 3023.21 samples/sec   Loss 10.3318   LearningRate 0.0499   Epoch: 5   Global Step: 72950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:13,270-Speed 2910.01 samples/sec   Loss 10.2788   LearningRate 0.0499   Epoch: 5   Global Step: 72960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:16,665-Speed 3016.56 samples/sec   Loss 10.2621   LearningRate 0.0499   Epoch: 5   Global Step: 72970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:20,073-Speed 3005.96 samples/sec   Loss 10.4643   LearningRate 0.0499   Epoch: 5   Global Step: 72980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:23,415-Speed 3064.98 samples/sec   Loss 10.3525   LearningRate 0.0499   Epoch: 5   Global Step: 72990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:26,755-Speed 3067.15 samples/sec   Loss 10.2337   LearningRate 0.0499   Epoch: 5   Global Step: 73000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:30,166-Speed 3002.76 samples/sec   Loss 10.1478   LearningRate 0.0499   Epoch: 5   Global Step: 73010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:33,627-Speed 2959.51 samples/sec   Loss 10.5560   LearningRate 0.0499   Epoch: 5   Global Step: 73020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:37,141-Speed 2915.07 samples/sec   Loss 10.4411   LearningRate 0.0498   Epoch: 5   Global Step: 73030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:40,570-Speed 2986.71 samples/sec   Loss 10.4128   LearningRate 0.0498   Epoch: 5   Global Step: 73040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:44,002-Speed 2984.63 samples/sec   Loss 10.3939   LearningRate 0.0498   Epoch: 5   Global Step: 73050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:36:47,424-Speed 2992.58 samples/sec   Loss 10.4430   LearningRate 0.0498   Epoch: 5   Global Step: 73060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:50,863-Speed 2978.43 samples/sec   Loss 10.2894   LearningRate 0.0498   Epoch: 5   Global Step: 73070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:54,226-Speed 3045.88 samples/sec   Loss 10.2948   LearningRate 0.0498   Epoch: 5   Global Step: 73080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:36:57,563-Speed 3069.92 samples/sec   Loss 10.3296   LearningRate 0.0498   Epoch: 5   Global Step: 73090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:37:00,966-Speed 3010.35 samples/sec   Loss 10.4269   LearningRate 0.0498   Epoch: 5   Global Step: 73100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:37:04,328-Speed 3045.96 samples/sec   Loss 10.2462   LearningRate 0.0498   Epoch: 5   Global Step: 73110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:37:07,705-Speed 3033.44 samples/sec   Loss 10.4742   LearningRate 0.0498   Epoch: 5   Global Step: 73120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:37:11,088-Speed 3027.73 samples/sec   Loss 10.2588   LearningRate 0.0498   Epoch: 5   Global Step: 73130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:37:14,401-Speed 3092.50 samples/sec   Loss 10.3065   LearningRate 0.0498   Epoch: 5   Global Step: 73140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:37:17,813-Speed 3001.15 samples/sec   Loss 10.4049   LearningRate 0.0498   Epoch: 5   Global Step: 73150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:37:21,212-Speed 3014.10 samples/sec   Loss 10.1118   LearningRate 0.0498   Epoch: 5   Global Step: 73160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:37:24,575-Speed 3045.91 samples/sec   Loss 10.2891   LearningRate 0.0498   Epoch: 5   Global Step: 73170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:37:27,948-Speed 3036.66 samples/sec   Loss 10.3075   LearningRate 0.0498   Epoch: 5   Global Step: 73180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:37:31,286-Speed 3068.08 samples/sec   Loss 10.2967   LearningRate 0.0498   Epoch: 5   Global Step: 73190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:37:34,660-Speed 3036.07 samples/sec   Loss 10.4052   LearningRate 0.0498   Epoch: 5   Global Step: 73200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:37:38,006-Speed 3061.28 samples/sec   Loss 10.3757   LearningRate 0.0497   Epoch: 5   Global Step: 73210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:37:41,361-Speed 3053.12 samples/sec   Loss 10.2287   LearningRate 0.0497   Epoch: 5   Global Step: 73220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:37:44,747-Speed 3025.31 samples/sec   Loss 10.1888   LearningRate 0.0497   Epoch: 5   Global Step: 73230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:37:48,134-Speed 3023.39 samples/sec   Loss 10.3929   LearningRate 0.0497   Epoch: 5   Global Step: 73240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:37:51,528-Speed 3017.90 samples/sec   Loss 10.2948   LearningRate 0.0497   Epoch: 5   Global Step: 73250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:37:54,842-Speed 3090.88 samples/sec   Loss 10.3308   LearningRate 0.0497   Epoch: 5   Global Step: 73260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:37:58,173-Speed 3075.03 samples/sec   Loss 10.3851   LearningRate 0.0497   Epoch: 5   Global Step: 73270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:38:01,632-Speed 2960.89 samples/sec   Loss 10.3053   LearningRate 0.0497   Epoch: 5   Global Step: 73280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:38:05,054-Speed 2993.66 samples/sec   Loss 10.2853   LearningRate 0.0497   Epoch: 5   Global Step: 73290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:38:08,470-Speed 2998.15 samples/sec   Loss 10.3600   LearningRate 0.0497   Epoch: 5   Global Step: 73300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:38:11,843-Speed 3036.67 samples/sec   Loss 10.2980   LearningRate 0.0497   Epoch: 5   Global Step: 73310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:38:15,302-Speed 2961.09 samples/sec   Loss 10.2996   LearningRate 0.0497   Epoch: 5   Global Step: 73320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:38:18,624-Speed 3084.13 samples/sec   Loss 10.4179   LearningRate 0.0497   Epoch: 5   Global Step: 73330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:38:21,998-Speed 3034.87 samples/sec   Loss 10.3767   LearningRate 0.0497   Epoch: 5   Global Step: 73340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:38:25,396-Speed 3014.31 samples/sec   Loss 10.3097   LearningRate 0.0497   Epoch: 5   Global Step: 73350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:38:28,771-Speed 3035.34 samples/sec   Loss 10.2880   LearningRate 0.0497   Epoch: 5   Global Step: 73360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:38:32,120-Speed 3058.38 samples/sec   Loss 10.1940   LearningRate 0.0497   Epoch: 5   Global Step: 73370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:38:35,530-Speed 3004.10 samples/sec   Loss 10.2957   LearningRate 0.0496   Epoch: 5   Global Step: 73380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:38:38,919-Speed 3022.18 samples/sec   Loss 10.2298   LearningRate 0.0496   Epoch: 5   Global Step: 73390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:38:42,339-Speed 2994.95 samples/sec   Loss 10.4740   LearningRate 0.0496   Epoch: 5   Global Step: 73400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:38:45,705-Speed 3043.61 samples/sec   Loss 10.3303   LearningRate 0.0496   Epoch: 5   Global Step: 73410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:38:49,027-Speed 3082.96 samples/sec   Loss 10.2071   LearningRate 0.0496   Epoch: 5   Global Step: 73420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:38:52,391-Speed 3045.10 samples/sec   Loss 10.2561   LearningRate 0.0496   Epoch: 5   Global Step: 73430   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:38:55,783-Speed 3019.45 samples/sec   Loss 10.1875   LearningRate 0.0496   Epoch: 5   Global Step: 73440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:38:59,138-Speed 3053.77 samples/sec   Loss 10.1387   LearningRate 0.0496   Epoch: 5   Global Step: 73450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:39:02,477-Speed 3067.37 samples/sec   Loss 10.1845   LearningRate 0.0496   Epoch: 5   Global Step: 73460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:39:05,799-Speed 3083.20 samples/sec   Loss 10.2722   LearningRate 0.0496   Epoch: 5   Global Step: 73470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:39:09,180-Speed 3030.35 samples/sec   Loss 10.2357   LearningRate 0.0496   Epoch: 5   Global Step: 73480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:39:12,496-Speed 3088.34 samples/sec   Loss 10.3682   LearningRate 0.0496   Epoch: 5   Global Step: 73490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:39:15,849-Speed 3055.38 samples/sec   Loss 10.1973   LearningRate 0.0496   Epoch: 5   Global Step: 73500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:39:19,118-Speed 3132.86 samples/sec   Loss 10.4214   LearningRate 0.0496   Epoch: 5   Global Step: 73510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:39:22,482-Speed 3044.74 samples/sec   Loss 10.2996   LearningRate 0.0496   Epoch: 5   Global Step: 73520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:39:25,808-Speed 3080.15 samples/sec   Loss 10.2957   LearningRate 0.0496   Epoch: 5   Global Step: 73530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:39:29,118-Speed 3094.16 samples/sec   Loss 10.4521   LearningRate 0.0496   Epoch: 5   Global Step: 73540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:39:32,509-Speed 3020.93 samples/sec   Loss 10.3228   LearningRate 0.0496   Epoch: 5   Global Step: 73550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:39:35,890-Speed 3029.07 samples/sec   Loss 10.2694   LearningRate 0.0495   Epoch: 5   Global Step: 73560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:39:39,347-Speed 2962.61 samples/sec   Loss 10.4077   LearningRate 0.0495   Epoch: 5   Global Step: 73570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:39:42,784-Speed 2980.83 samples/sec   Loss 10.3145   LearningRate 0.0495   Epoch: 5   Global Step: 73580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:39:46,126-Speed 3064.14 samples/sec   Loss 10.3809   LearningRate 0.0495   Epoch: 5   Global Step: 73590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:39:49,487-Speed 3047.55 samples/sec   Loss 10.3297   LearningRate 0.0495   Epoch: 5   Global Step: 73600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:39:52,836-Speed 3058.74 samples/sec   Loss 10.2545   LearningRate 0.0495   Epoch: 5   Global Step: 73610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:39:56,352-Speed 2912.94 samples/sec   Loss 10.1957   LearningRate 0.0495   Epoch: 5   Global Step: 73620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:39:59,729-Speed 3033.49 samples/sec   Loss 10.3972   LearningRate 0.0495   Epoch: 5   Global Step: 73630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:40:03,100-Speed 3037.92 samples/sec   Loss 10.3055   LearningRate 0.0495   Epoch: 5   Global Step: 73640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:40:06,467-Speed 3042.51 samples/sec   Loss 10.2458   LearningRate 0.0495   Epoch: 5   Global Step: 73650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:40:09,888-Speed 2993.89 samples/sec   Loss 10.1908   LearningRate 0.0495   Epoch: 5   Global Step: 73660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:40:13,261-Speed 3036.67 samples/sec   Loss 10.2245   LearningRate 0.0495   Epoch: 5   Global Step: 73670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:40:16,569-Speed 3096.70 samples/sec   Loss 10.4708   LearningRate 0.0495   Epoch: 5   Global Step: 73680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:40:19,886-Speed 3088.15 samples/sec   Loss 10.2948   LearningRate 0.0495   Epoch: 5   Global Step: 73690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:40:23,260-Speed 3035.49 samples/sec   Loss 10.4001   LearningRate 0.0495   Epoch: 5   Global Step: 73700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:40:26,691-Speed 2985.44 samples/sec   Loss 10.3691   LearningRate 0.0495   Epoch: 5   Global Step: 73710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:40:30,064-Speed 3036.41 samples/sec   Loss 10.2779   LearningRate 0.0495   Epoch: 5   Global Step: 73720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:40:33,416-Speed 3055.86 samples/sec   Loss 10.3558   LearningRate 0.0494   Epoch: 5   Global Step: 73730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:40:36,792-Speed 3034.00 samples/sec   Loss 10.2807   LearningRate 0.0494   Epoch: 5   Global Step: 73740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:40:40,160-Speed 3041.82 samples/sec   Loss 10.3151   LearningRate 0.0494   Epoch: 5   Global Step: 73750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:40:43,540-Speed 3029.88 samples/sec   Loss 10.2772   LearningRate 0.0494   Epoch: 5   Global Step: 73760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:40:46,938-Speed 3015.00 samples/sec   Loss 10.2474   LearningRate 0.0494   Epoch: 5   Global Step: 73770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:40:50,321-Speed 3027.56 samples/sec   Loss 10.4037   LearningRate 0.0494   Epoch: 5   Global Step: 73780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:40:53,802-Speed 2942.14 samples/sec   Loss 10.2101   LearningRate 0.0494   Epoch: 5   Global Step: 73790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:40:57,264-Speed 2959.49 samples/sec   Loss 10.2783   LearningRate 0.0494   Epoch: 5   Global Step: 73800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:41:00,665-Speed 3011.80 samples/sec   Loss 10.3216   LearningRate 0.0494   Epoch: 5   Global Step: 73810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:41:04,064-Speed 3013.84 samples/sec   Loss 10.4767   LearningRate 0.0494   Epoch: 5   Global Step: 73820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:41:07,482-Speed 2997.52 samples/sec   Loss 10.3472   LearningRate 0.0494   Epoch: 5   Global Step: 73830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:41:10,836-Speed 3053.91 samples/sec   Loss 10.4145   LearningRate 0.0494   Epoch: 5   Global Step: 73840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:41:14,203-Speed 3042.67 samples/sec   Loss 10.3138   LearningRate 0.0494   Epoch: 5   Global Step: 73850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:41:17,564-Speed 3047.31 samples/sec   Loss 10.3832   LearningRate 0.0494   Epoch: 5   Global Step: 73860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:41:20,963-Speed 3012.89 samples/sec   Loss 10.2813   LearningRate 0.0494   Epoch: 5   Global Step: 73870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:41:24,436-Speed 2949.59 samples/sec   Loss 10.3801   LearningRate 0.0494   Epoch: 5   Global Step: 73880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:41:27,852-Speed 2998.74 samples/sec   Loss 10.2891   LearningRate 0.0494   Epoch: 5   Global Step: 73890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:41:31,274-Speed 2993.29 samples/sec   Loss 10.1221   LearningRate 0.0494   Epoch: 5   Global Step: 73900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:41:34,595-Speed 3083.45 samples/sec   Loss 10.2639   LearningRate 0.0493   Epoch: 5   Global Step: 73910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:41:37,962-Speed 3042.38 samples/sec   Loss 10.1513   LearningRate 0.0493   Epoch: 5   Global Step: 73920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:41:41,321-Speed 3050.01 samples/sec   Loss 10.3111   LearningRate 0.0493   Epoch: 5   Global Step: 73930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:41:44,759-Speed 2979.39 samples/sec   Loss 10.2671   LearningRate 0.0493   Epoch: 5   Global Step: 73940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:41:48,129-Speed 3038.93 samples/sec   Loss 10.3032   LearningRate 0.0493   Epoch: 5   Global Step: 73950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:41:51,608-Speed 2944.14 samples/sec   Loss 10.2279   LearningRate 0.0493   Epoch: 5   Global Step: 73960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:41:54,994-Speed 3024.92 samples/sec   Loss 10.3748   LearningRate 0.0493   Epoch: 5   Global Step: 73970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:41:58,388-Speed 3018.16 samples/sec   Loss 10.3386   LearningRate 0.0493   Epoch: 5   Global Step: 73980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:42:01,861-Speed 2949.20 samples/sec   Loss 10.2583   LearningRate 0.0493   Epoch: 5   Global Step: 73990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:42:05,242-Speed 3030.13 samples/sec   Loss 10.2606   LearningRate 0.0493   Epoch: 5   Global Step: 74000   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:42:08,637-Speed 3016.55 samples/sec   Loss 10.2729   LearningRate 0.0493   Epoch: 5   Global Step: 74010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:42:11,988-Speed 3056.51 samples/sec   Loss 10.2261   LearningRate 0.0493   Epoch: 5   Global Step: 74020   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:42:15,343-Speed 3052.84 samples/sec   Loss 10.2448   LearningRate 0.0493   Epoch: 5   Global Step: 74030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:42:18,760-Speed 2997.18 samples/sec   Loss 10.2426   LearningRate 0.0493   Epoch: 5   Global Step: 74040   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:42:22,135-Speed 3035.06 samples/sec   Loss 10.3444   LearningRate 0.0493   Epoch: 5   Global Step: 74050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:42:25,521-Speed 3025.88 samples/sec   Loss 10.2757   LearningRate 0.0493   Epoch: 5   Global Step: 74060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:42:28,956-Speed 2981.13 samples/sec   Loss 10.1542   LearningRate 0.0493   Epoch: 5   Global Step: 74070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:42:32,295-Speed 3067.55 samples/sec   Loss 10.4233   LearningRate 0.0493   Epoch: 5   Global Step: 74080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:42:35,650-Speed 3053.46 samples/sec   Loss 10.3772   LearningRate 0.0492   Epoch: 5   Global Step: 74090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:42:39,026-Speed 3033.45 samples/sec   Loss 10.3001   LearningRate 0.0492   Epoch: 5   Global Step: 74100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:42:42,462-Speed 2981.38 samples/sec   Loss 10.2053   LearningRate 0.0492   Epoch: 5   Global Step: 74110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:42:45,851-Speed 3022.44 samples/sec   Loss 10.2481   LearningRate 0.0492   Epoch: 5   Global Step: 74120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:42:49,270-Speed 2995.89 samples/sec   Loss 10.2462   LearningRate 0.0492   Epoch: 5   Global Step: 74130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:42:52,619-Speed 3058.72 samples/sec   Loss 10.2839   LearningRate 0.0492   Epoch: 5   Global Step: 74140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:42:56,014-Speed 3018.01 samples/sec   Loss 10.1839   LearningRate 0.0492   Epoch: 5   Global Step: 74150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:42:59,432-Speed 2996.59 samples/sec   Loss 10.2773   LearningRate 0.0492   Epoch: 5   Global Step: 74160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:43:02,847-Speed 2999.32 samples/sec   Loss 10.1461   LearningRate 0.0492   Epoch: 5   Global Step: 74170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:43:06,247-Speed 3012.31 samples/sec   Loss 10.2035   LearningRate 0.0492   Epoch: 5   Global Step: 74180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:43:09,580-Speed 3073.41 samples/sec   Loss 10.2782   LearningRate 0.0492   Epoch: 5   Global Step: 74190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:43:12,961-Speed 3029.77 samples/sec   Loss 10.3336   LearningRate 0.0492   Epoch: 5   Global Step: 74200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:43:16,380-Speed 2995.89 samples/sec   Loss 10.1883   LearningRate 0.0492   Epoch: 5   Global Step: 74210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:43:19,731-Speed 3057.20 samples/sec   Loss 10.1014   LearningRate 0.0492   Epoch: 5   Global Step: 74220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:43:23,128-Speed 3015.63 samples/sec   Loss 10.2516   LearningRate 0.0492   Epoch: 5   Global Step: 74230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:43:26,493-Speed 3044.70 samples/sec   Loss 10.3004   LearningRate 0.0492   Epoch: 5   Global Step: 74240   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:43:29,962-Speed 2952.18 samples/sec   Loss 10.2467   LearningRate 0.0492   Epoch: 5   Global Step: 74250   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:43:33,368-Speed 3008.41 samples/sec   Loss 10.1543   LearningRate 0.0492   Epoch: 5   Global Step: 74260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:43:36,694-Speed 3078.96 samples/sec   Loss 10.2853   LearningRate 0.0491   Epoch: 5   Global Step: 74270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:43:40,049-Speed 3053.33 samples/sec   Loss 10.3075   LearningRate 0.0491   Epoch: 5   Global Step: 74280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:43:43,439-Speed 3020.88 samples/sec   Loss 10.3789   LearningRate 0.0491   Epoch: 5   Global Step: 74290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:43:46,801-Speed 3046.77 samples/sec   Loss 10.2828   LearningRate 0.0491   Epoch: 5   Global Step: 74300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:43:50,165-Speed 3044.98 samples/sec   Loss 10.2135   LearningRate 0.0491   Epoch: 5   Global Step: 74310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:43:53,519-Speed 3053.76 samples/sec   Loss 10.1096   LearningRate 0.0491   Epoch: 5   Global Step: 74320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:43:56,988-Speed 2952.87 samples/sec   Loss 10.2387   LearningRate 0.0491   Epoch: 5   Global Step: 74330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:44:00,403-Speed 2998.79 samples/sec   Loss 10.3560   LearningRate 0.0491   Epoch: 5   Global Step: 74340   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:44:03,784-Speed 3030.16 samples/sec   Loss 10.2618   LearningRate 0.0491   Epoch: 5   Global Step: 74350   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:44:07,180-Speed 3016.36 samples/sec   Loss 10.1436   LearningRate 0.0491   Epoch: 5   Global Step: 74360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:44:10,558-Speed 3031.99 samples/sec   Loss 10.2609   LearningRate 0.0491   Epoch: 5   Global Step: 74370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:44:13,967-Speed 3004.18 samples/sec   Loss 10.2685   LearningRate 0.0491   Epoch: 5   Global Step: 74380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:44:17,337-Speed 3039.85 samples/sec   Loss 10.1258   LearningRate 0.0491   Epoch: 5   Global Step: 74390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:44:20,759-Speed 2993.13 samples/sec   Loss 10.2211   LearningRate 0.0491   Epoch: 5   Global Step: 74400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:44:24,123-Speed 3044.55 samples/sec   Loss 10.3786   LearningRate 0.0491   Epoch: 5   Global Step: 74410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:44:27,524-Speed 3012.59 samples/sec   Loss 10.2506   LearningRate 0.0491   Epoch: 5   Global Step: 74420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:44:30,866-Speed 3064.10 samples/sec   Loss 10.2067   LearningRate 0.0491   Epoch: 5   Global Step: 74430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:44:34,261-Speed 3017.48 samples/sec   Loss 10.1625   LearningRate 0.0490   Epoch: 5   Global Step: 74440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:44:37,656-Speed 3016.82 samples/sec   Loss 10.1750   LearningRate 0.0490   Epoch: 5   Global Step: 74450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:44:41,021-Speed 3043.52 samples/sec   Loss 10.1639   LearningRate 0.0490   Epoch: 5   Global Step: 74460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:44:44,504-Speed 2941.53 samples/sec   Loss 10.3189   LearningRate 0.0490   Epoch: 5   Global Step: 74470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:44:47,917-Speed 3001.21 samples/sec   Loss 10.1237   LearningRate 0.0490   Epoch: 5   Global Step: 74480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:44:51,315-Speed 3013.86 samples/sec   Loss 10.2248   LearningRate 0.0490   Epoch: 5   Global Step: 74490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:44:54,800-Speed 2939.24 samples/sec   Loss 10.4323   LearningRate 0.0490   Epoch: 5   Global Step: 74500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:44:58,192-Speed 3019.95 samples/sec   Loss 10.1319   LearningRate 0.0490   Epoch: 5   Global Step: 74510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:45:01,908-Speed 2756.72 samples/sec   Loss 10.3306   LearningRate 0.0490   Epoch: 5   Global Step: 74520   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:45:33,871-Speed 320.39 samples/sec   Loss 9.6998   LearningRate 0.0490   Epoch: 6   Global Step: 74530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:45:37,487-Speed 2833.87 samples/sec   Loss 8.9480   LearningRate 0.0490   Epoch: 6   Global Step: 74540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:45:41,025-Speed 2894.89 samples/sec   Loss 8.8148   LearningRate 0.0490   Epoch: 6   Global Step: 74550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:45:44,481-Speed 2964.42 samples/sec   Loss 8.7749   LearningRate 0.0490   Epoch: 6   Global Step: 74560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:45:47,911-Speed 2986.38 samples/sec   Loss 8.6834   LearningRate 0.0490   Epoch: 6   Global Step: 74570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:45:51,311-Speed 3013.05 samples/sec   Loss 8.7957   LearningRate 0.0490   Epoch: 6   Global Step: 74580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:45:54,739-Speed 2988.34 samples/sec   Loss 8.8649   LearningRate 0.0490   Epoch: 6   Global Step: 74590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:45:58,127-Speed 3023.51 samples/sec   Loss 8.7687   LearningRate 0.0490   Epoch: 6   Global Step: 74600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:46:01,553-Speed 2990.12 samples/sec   Loss 8.7817   LearningRate 0.0490   Epoch: 6   Global Step: 74610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:46:05,065-Speed 2916.00 samples/sec   Loss 8.7817   LearningRate 0.0489   Epoch: 6   Global Step: 74620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:46:08,743-Speed 2785.06 samples/sec   Loss 8.8454   LearningRate 0.0489   Epoch: 6   Global Step: 74630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:46:12,174-Speed 2996.80 samples/sec   Loss 8.8552   LearningRate 0.0489   Epoch: 6   Global Step: 74640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:46:15,541-Speed 3042.33 samples/sec   Loss 8.7392   LearningRate 0.0489   Epoch: 6   Global Step: 74650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:46:18,933-Speed 3019.45 samples/sec   Loss 8.8558   LearningRate 0.0489   Epoch: 6   Global Step: 74660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:46:22,376-Speed 2974.97 samples/sec   Loss 9.0353   LearningRate 0.0489   Epoch: 6   Global Step: 74670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:46:25,749-Speed 3037.25 samples/sec   Loss 8.8821   LearningRate 0.0489   Epoch: 6   Global Step: 74680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:46:29,099-Speed 3058.01 samples/sec   Loss 8.9492   LearningRate 0.0489   Epoch: 6   Global Step: 74690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:46:32,534-Speed 2981.22 samples/sec   Loss 8.9338   LearningRate 0.0489   Epoch: 6   Global Step: 74700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:46:35,970-Speed 2981.41 samples/sec   Loss 8.7856   LearningRate 0.0489   Epoch: 6   Global Step: 74710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:46:39,330-Speed 3048.67 samples/sec   Loss 8.8833   LearningRate 0.0489   Epoch: 6   Global Step: 74720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:46:42,738-Speed 3005.39 samples/sec   Loss 8.8761   LearningRate 0.0489   Epoch: 6   Global Step: 74730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:46:46,101-Speed 3045.99 samples/sec   Loss 8.7385   LearningRate 0.0489   Epoch: 6   Global Step: 74740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:46:49,554-Speed 2966.73 samples/sec   Loss 9.0346   LearningRate 0.0489   Epoch: 6   Global Step: 74750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:46:52,902-Speed 3059.56 samples/sec   Loss 8.9555   LearningRate 0.0489   Epoch: 6   Global Step: 74760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:46:56,283-Speed 3030.47 samples/sec   Loss 8.9609   LearningRate 0.0489   Epoch: 6   Global Step: 74770   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:46:59,649-Speed 3042.92 samples/sec   Loss 8.9388   LearningRate 0.0489   Epoch: 6   Global Step: 74780   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:47:03,035-Speed 3025.20 samples/sec   Loss 8.9309   LearningRate 0.0489   Epoch: 6   Global Step: 74790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:47:06,379-Speed 3062.65 samples/sec   Loss 8.8191   LearningRate 0.0488   Epoch: 6   Global Step: 74800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:47:09,739-Speed 3048.25 samples/sec   Loss 9.0659   LearningRate 0.0488   Epoch: 6   Global Step: 74810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:47:13,190-Speed 2968.52 samples/sec   Loss 9.0627   LearningRate 0.0488   Epoch: 6   Global Step: 74820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:47:16,645-Speed 2965.00 samples/sec   Loss 8.8793   LearningRate 0.0488   Epoch: 6   Global Step: 74830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:47:20,084-Speed 2977.69 samples/sec   Loss 9.0565   LearningRate 0.0488   Epoch: 6   Global Step: 74840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:47:23,578-Speed 2931.75 samples/sec   Loss 9.1583   LearningRate 0.0488   Epoch: 6   Global Step: 74850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:47:26,984-Speed 3007.84 samples/sec   Loss 8.9714   LearningRate 0.0488   Epoch: 6   Global Step: 74860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:47:30,320-Speed 3070.06 samples/sec   Loss 9.0181   LearningRate 0.0488   Epoch: 6   Global Step: 74870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:47:33,700-Speed 3030.44 samples/sec   Loss 9.0502   LearningRate 0.0488   Epoch: 6   Global Step: 74880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:47:37,107-Speed 3006.82 samples/sec   Loss 8.9486   LearningRate 0.0488   Epoch: 6   Global Step: 74890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:47:40,525-Speed 2996.57 samples/sec   Loss 8.9196   LearningRate 0.0488   Epoch: 6   Global Step: 74900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:47:43,922-Speed 3015.32 samples/sec   Loss 8.9909   LearningRate 0.0488   Epoch: 6   Global Step: 74910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:47:47,305-Speed 3028.58 samples/sec   Loss 8.9809   LearningRate 0.0488   Epoch: 6   Global Step: 74920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:47:50,684-Speed 3031.36 samples/sec   Loss 9.1026   LearningRate 0.0488   Epoch: 6   Global Step: 74930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:47:53,997-Speed 3091.93 samples/sec   Loss 8.9917   LearningRate 0.0488   Epoch: 6   Global Step: 74940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:47:57,387-Speed 3021.55 samples/sec   Loss 9.0385   LearningRate 0.0488   Epoch: 6   Global Step: 74950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:48:00,732-Speed 3061.77 samples/sec   Loss 8.8799   LearningRate 0.0488   Epoch: 6   Global Step: 74960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:48:04,088-Speed 3051.88 samples/sec   Loss 9.0861   LearningRate 0.0488   Epoch: 6   Global Step: 74970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:48:07,510-Speed 2993.58 samples/sec   Loss 9.0813   LearningRate 0.0487   Epoch: 6   Global Step: 74980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:48:10,862-Speed 3056.20 samples/sec   Loss 9.0176   LearningRate 0.0487   Epoch: 6   Global Step: 74990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:48:14,205-Speed 3063.71 samples/sec   Loss 9.1416   LearningRate 0.0487   Epoch: 6   Global Step: 75000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:48:17,667-Speed 2959.12 samples/sec   Loss 9.1205   LearningRate 0.0487   Epoch: 6   Global Step: 75010   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 08:48:21,060-Speed 3018.76 samples/sec   Loss 9.0506   LearningRate 0.0487   Epoch: 6   Global Step: 75020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:48:24,458-Speed 3014.93 samples/sec   Loss 9.1461   LearningRate 0.0487   Epoch: 6   Global Step: 75030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:48:27,853-Speed 3017.41 samples/sec   Loss 9.0471   LearningRate 0.0487   Epoch: 6   Global Step: 75040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:48:31,221-Speed 3041.01 samples/sec   Loss 9.1475   LearningRate 0.0487   Epoch: 6   Global Step: 75050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:48:34,560-Speed 3067.87 samples/sec   Loss 9.0485   LearningRate 0.0487   Epoch: 6   Global Step: 75060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:48:37,886-Speed 3079.41 samples/sec   Loss 9.0507   LearningRate 0.0487   Epoch: 6   Global Step: 75070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:48:41,287-Speed 3011.39 samples/sec   Loss 9.1527   LearningRate 0.0487   Epoch: 6   Global Step: 75080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:48:44,677-Speed 3022.27 samples/sec   Loss 9.0377   LearningRate 0.0487   Epoch: 6   Global Step: 75090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:48:48,064-Speed 3024.38 samples/sec   Loss 9.0279   LearningRate 0.0487   Epoch: 6   Global Step: 75100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:48:51,439-Speed 3034.55 samples/sec   Loss 9.1755   LearningRate 0.0487   Epoch: 6   Global Step: 75110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:48:54,876-Speed 2979.99 samples/sec   Loss 9.1152   LearningRate 0.0487   Epoch: 6   Global Step: 75120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:48:58,263-Speed 3024.14 samples/sec   Loss 9.0845   LearningRate 0.0487   Epoch: 6   Global Step: 75130   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:49:01,651-Speed 3023.74 samples/sec   Loss 9.0695   LearningRate 0.0487   Epoch: 6   Global Step: 75140   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:49:05,063-Speed 3002.00 samples/sec   Loss 9.1648   LearningRate 0.0486   Epoch: 6   Global Step: 75150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:49:08,453-Speed 3020.94 samples/sec   Loss 9.1358   LearningRate 0.0486   Epoch: 6   Global Step: 75160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:49:11,909-Speed 2963.76 samples/sec   Loss 9.1411   LearningRate 0.0486   Epoch: 6   Global Step: 75170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:49:15,327-Speed 2996.60 samples/sec   Loss 8.9664   LearningRate 0.0486   Epoch: 6   Global Step: 75180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:49:18,716-Speed 3022.23 samples/sec   Loss 9.3463   LearningRate 0.0486   Epoch: 6   Global Step: 75190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:49:22,114-Speed 3014.87 samples/sec   Loss 9.1414   LearningRate 0.0486   Epoch: 6   Global Step: 75200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:49:25,521-Speed 3006.14 samples/sec   Loss 9.1977   LearningRate 0.0486   Epoch: 6   Global Step: 75210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:49:28,882-Speed 3047.24 samples/sec   Loss 9.2658   LearningRate 0.0486   Epoch: 6   Global Step: 75220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:49:32,288-Speed 3007.66 samples/sec   Loss 9.2224   LearningRate 0.0486   Epoch: 6   Global Step: 75230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:49:35,710-Speed 2993.00 samples/sec   Loss 9.1116   LearningRate 0.0486   Epoch: 6   Global Step: 75240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:49:39,086-Speed 3034.27 samples/sec   Loss 9.2883   LearningRate 0.0486   Epoch: 6   Global Step: 75250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:49:42,434-Speed 3058.90 samples/sec   Loss 9.3048   LearningRate 0.0486   Epoch: 6   Global Step: 75260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:49:45,786-Speed 3055.93 samples/sec   Loss 9.2620   LearningRate 0.0486   Epoch: 6   Global Step: 75270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:49:49,122-Speed 3070.58 samples/sec   Loss 9.1512   LearningRate 0.0486   Epoch: 6   Global Step: 75280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:49:52,563-Speed 2976.72 samples/sec   Loss 9.1720   LearningRate 0.0486   Epoch: 6   Global Step: 75290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:49:55,944-Speed 3029.84 samples/sec   Loss 9.1822   LearningRate 0.0486   Epoch: 6   Global Step: 75300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:49:59,335-Speed 3021.03 samples/sec   Loss 9.2114   LearningRate 0.0486   Epoch: 6   Global Step: 75310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:50:02,664-Speed 3076.72 samples/sec   Loss 9.1633   LearningRate 0.0486   Epoch: 6   Global Step: 75320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:50:06,059-Speed 3016.99 samples/sec   Loss 9.3064   LearningRate 0.0485   Epoch: 6   Global Step: 75330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:50:09,431-Speed 3038.09 samples/sec   Loss 9.1933   LearningRate 0.0485   Epoch: 6   Global Step: 75340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:50:12,841-Speed 3004.49 samples/sec   Loss 9.3679   LearningRate 0.0485   Epoch: 6   Global Step: 75350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:50:16,293-Speed 2966.74 samples/sec   Loss 9.2500   LearningRate 0.0485   Epoch: 6   Global Step: 75360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:50:19,713-Speed 2995.05 samples/sec   Loss 9.2270   LearningRate 0.0485   Epoch: 6   Global Step: 75370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:50:23,089-Speed 3034.68 samples/sec   Loss 9.1307   LearningRate 0.0485   Epoch: 6   Global Step: 75380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:50:26,490-Speed 3010.92 samples/sec   Loss 9.3491   LearningRate 0.0485   Epoch: 6   Global Step: 75390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:50:29,873-Speed 3028.42 samples/sec   Loss 9.2493   LearningRate 0.0485   Epoch: 6   Global Step: 75400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:50:33,241-Speed 3041.18 samples/sec   Loss 9.2028   LearningRate 0.0485   Epoch: 6   Global Step: 75410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:50:36,621-Speed 3030.30 samples/sec   Loss 9.3683   LearningRate 0.0485   Epoch: 6   Global Step: 75420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:50:39,996-Speed 3034.54 samples/sec   Loss 9.2578   LearningRate 0.0485   Epoch: 6   Global Step: 75430   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:50:43,397-Speed 3011.45 samples/sec   Loss 9.2727   LearningRate 0.0485   Epoch: 6   Global Step: 75440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:50:46,786-Speed 3022.95 samples/sec   Loss 9.3441   LearningRate 0.0485   Epoch: 6   Global Step: 75450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:50:50,153-Speed 3041.77 samples/sec   Loss 9.1911   LearningRate 0.0485   Epoch: 6   Global Step: 75460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:50:53,627-Speed 2948.60 samples/sec   Loss 9.3408   LearningRate 0.0485   Epoch: 6   Global Step: 75470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:50:57,005-Speed 3032.82 samples/sec   Loss 9.3937   LearningRate 0.0485   Epoch: 6   Global Step: 75480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:51:00,427-Speed 2992.93 samples/sec   Loss 9.2078   LearningRate 0.0485   Epoch: 6   Global Step: 75490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:51:03,862-Speed 2982.22 samples/sec   Loss 9.2721   LearningRate 0.0485   Epoch: 6   Global Step: 75500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:51:07,338-Speed 2947.89 samples/sec   Loss 9.3184   LearningRate 0.0484   Epoch: 6   Global Step: 75510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:51:10,738-Speed 3012.44 samples/sec   Loss 9.2335   LearningRate 0.0484   Epoch: 6   Global Step: 75520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:51:14,215-Speed 2945.52 samples/sec   Loss 9.3551   LearningRate 0.0484   Epoch: 6   Global Step: 75530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:51:17,560-Speed 3062.15 samples/sec   Loss 9.4974   LearningRate 0.0484   Epoch: 6   Global Step: 75540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:51:20,990-Speed 2986.24 samples/sec   Loss 9.4088   LearningRate 0.0484   Epoch: 6   Global Step: 75550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:51:24,407-Speed 2997.61 samples/sec   Loss 9.3839   LearningRate 0.0484   Epoch: 6   Global Step: 75560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:51:27,790-Speed 3028.21 samples/sec   Loss 9.4525   LearningRate 0.0484   Epoch: 6   Global Step: 75570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:51:31,129-Speed 3067.81 samples/sec   Loss 9.3630   LearningRate 0.0484   Epoch: 6   Global Step: 75580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:51:34,492-Speed 3045.69 samples/sec   Loss 9.3714   LearningRate 0.0484   Epoch: 6   Global Step: 75590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:51:37,904-Speed 3002.00 samples/sec   Loss 9.3435   LearningRate 0.0484   Epoch: 6   Global Step: 75600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:51:41,308-Speed 3009.02 samples/sec   Loss 9.5091   LearningRate 0.0484   Epoch: 6   Global Step: 75610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:51:44,619-Speed 3093.27 samples/sec   Loss 9.3125   LearningRate 0.0484   Epoch: 6   Global Step: 75620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:51:48,017-Speed 3014.56 samples/sec   Loss 9.4150   LearningRate 0.0484   Epoch: 6   Global Step: 75630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:51:51,375-Speed 3050.83 samples/sec   Loss 9.3398   LearningRate 0.0484   Epoch: 6   Global Step: 75640   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:51:54,774-Speed 3013.44 samples/sec   Loss 9.5156   LearningRate 0.0484   Epoch: 6   Global Step: 75650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:51:58,208-Speed 2982.88 samples/sec   Loss 9.4093   LearningRate 0.0484   Epoch: 6   Global Step: 75660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:52:01,679-Speed 2950.55 samples/sec   Loss 9.3481   LearningRate 0.0484   Epoch: 6   Global Step: 75670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:52:05,149-Speed 2951.89 samples/sec   Loss 9.4359   LearningRate 0.0484   Epoch: 6   Global Step: 75680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:52:08,506-Speed 3051.20 samples/sec   Loss 9.4613   LearningRate 0.0483   Epoch: 6   Global Step: 75690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:52:11,862-Speed 3051.73 samples/sec   Loss 9.5154   LearningRate 0.0483   Epoch: 6   Global Step: 75700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:52:15,226-Speed 3044.85 samples/sec   Loss 9.5376   LearningRate 0.0483   Epoch: 6   Global Step: 75710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:52:18,583-Speed 3051.90 samples/sec   Loss 9.4340   LearningRate 0.0483   Epoch: 6   Global Step: 75720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:52:22,020-Speed 2980.44 samples/sec   Loss 9.3456   LearningRate 0.0483   Epoch: 6   Global Step: 75730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:52:25,363-Speed 3063.93 samples/sec   Loss 9.3680   LearningRate 0.0483   Epoch: 6   Global Step: 75740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:52:28,775-Speed 3002.39 samples/sec   Loss 9.4242   LearningRate 0.0483   Epoch: 6   Global Step: 75750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:52:32,224-Speed 2969.53 samples/sec   Loss 9.5886   LearningRate 0.0483   Epoch: 6   Global Step: 75760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:52:35,636-Speed 3002.11 samples/sec   Loss 9.4544   LearningRate 0.0483   Epoch: 6   Global Step: 75770   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:52:39,042-Speed 3006.56 samples/sec   Loss 9.4630   LearningRate 0.0483   Epoch: 6   Global Step: 75780   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:52:42,475-Speed 2983.74 samples/sec   Loss 9.5633   LearningRate 0.0483   Epoch: 6   Global Step: 75790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:52:45,846-Speed 3039.59 samples/sec   Loss 9.5647   LearningRate 0.0483   Epoch: 6   Global Step: 75800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:52:49,225-Speed 3030.71 samples/sec   Loss 9.4992   LearningRate 0.0483   Epoch: 6   Global Step: 75810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:52:52,591-Speed 3042.59 samples/sec   Loss 9.5042   LearningRate 0.0483   Epoch: 6   Global Step: 75820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:52:56,016-Speed 2991.65 samples/sec   Loss 9.4965   LearningRate 0.0483   Epoch: 6   Global Step: 75830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:52:59,369-Speed 3054.33 samples/sec   Loss 9.5315   LearningRate 0.0483   Epoch: 6   Global Step: 75840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:53:02,750-Speed 3030.00 samples/sec   Loss 9.4455   LearningRate 0.0483   Epoch: 6   Global Step: 75850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:53:06,170-Speed 2994.66 samples/sec   Loss 9.4786   LearningRate 0.0483   Epoch: 6   Global Step: 75860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:53:09,562-Speed 3019.74 samples/sec   Loss 9.4987   LearningRate 0.0482   Epoch: 6   Global Step: 75870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:53:13,055-Speed 2932.40 samples/sec   Loss 9.5387   LearningRate 0.0482   Epoch: 6   Global Step: 75880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:53:16,513-Speed 2961.96 samples/sec   Loss 9.4653   LearningRate 0.0482   Epoch: 6   Global Step: 75890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:53:19,868-Speed 3053.17 samples/sec   Loss 9.4321   LearningRate 0.0482   Epoch: 6   Global Step: 75900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:53:23,211-Speed 3064.11 samples/sec   Loss 9.5545   LearningRate 0.0482   Epoch: 6   Global Step: 75910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:53:26,602-Speed 3020.89 samples/sec   Loss 9.3820   LearningRate 0.0482   Epoch: 6   Global Step: 75920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:53:30,093-Speed 2933.62 samples/sec   Loss 9.3922   LearningRate 0.0482   Epoch: 6   Global Step: 75930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:53:33,438-Speed 3062.47 samples/sec   Loss 9.5106   LearningRate 0.0482   Epoch: 6   Global Step: 75940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:53:36,837-Speed 3013.16 samples/sec   Loss 9.3443   LearningRate 0.0482   Epoch: 6   Global Step: 75950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:53:40,310-Speed 2949.28 samples/sec   Loss 9.4151   LearningRate 0.0482   Epoch: 6   Global Step: 75960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:53:43,766-Speed 2964.23 samples/sec   Loss 9.4383   LearningRate 0.0482   Epoch: 6   Global Step: 75970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:53:47,170-Speed 3008.75 samples/sec   Loss 9.3968   LearningRate 0.0482   Epoch: 6   Global Step: 75980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:53:50,559-Speed 3022.63 samples/sec   Loss 9.4784   LearningRate 0.0482   Epoch: 6   Global Step: 75990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:53:53,872-Speed 3092.09 samples/sec   Loss 9.5711   LearningRate 0.0482   Epoch: 6   Global Step: 76000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:53:57,447-Speed 2864.92 samples/sec   Loss 9.5863   LearningRate 0.0482   Epoch: 6   Global Step: 76010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:54:00,926-Speed 2944.73 samples/sec   Loss 9.5619   LearningRate 0.0482   Epoch: 6   Global Step: 76020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 08:54:04,353-Speed 2988.46 samples/sec   Loss 9.5364   LearningRate 0.0482   Epoch: 6   Global Step: 76030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:07,775-Speed 2993.29 samples/sec   Loss 9.5417   LearningRate 0.0482   Epoch: 6   Global Step: 76040   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:11,223-Speed 2971.13 samples/sec   Loss 9.5216   LearningRate 0.0481   Epoch: 6   Global Step: 76050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:14,626-Speed 3009.61 samples/sec   Loss 9.4436   LearningRate 0.0481   Epoch: 6   Global Step: 76060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:18,061-Speed 2981.67 samples/sec   Loss 9.4797   LearningRate 0.0481   Epoch: 6   Global Step: 76070   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:21,400-Speed 3068.07 samples/sec   Loss 9.4801   LearningRate 0.0481   Epoch: 6   Global Step: 76080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:24,728-Speed 3078.09 samples/sec   Loss 9.6099   LearningRate 0.0481   Epoch: 6   Global Step: 76090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:28,160-Speed 2984.33 samples/sec   Loss 9.5525   LearningRate 0.0481   Epoch: 6   Global Step: 76100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:31,613-Speed 2966.29 samples/sec   Loss 9.5740   LearningRate 0.0481   Epoch: 6   Global Step: 76110   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:35,011-Speed 3015.05 samples/sec   Loss 9.7543   LearningRate 0.0481   Epoch: 6   Global Step: 76120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:38,384-Speed 3036.40 samples/sec   Loss 9.4945   LearningRate 0.0481   Epoch: 6   Global Step: 76130   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-04-27 08:54:41,834-Speed 2968.87 samples/sec   Loss 9.5437   LearningRate 0.0481   Epoch: 6   Global Step: 76140   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:45,216-Speed 3028.09 samples/sec   Loss 9.4925   LearningRate 0.0481   Epoch: 6   Global Step: 76150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:48,618-Speed 3011.11 samples/sec   Loss 9.7061   LearningRate 0.0481   Epoch: 6   Global Step: 76160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:51,997-Speed 3031.45 samples/sec   Loss 9.6177   LearningRate 0.0481   Epoch: 6   Global Step: 76170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:55,466-Speed 2952.75 samples/sec   Loss 9.6423   LearningRate 0.0481   Epoch: 6   Global Step: 76180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:54:58,879-Speed 3001.23 samples/sec   Loss 9.6546   LearningRate 0.0481   Epoch: 6   Global Step: 76190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:02,299-Speed 2995.28 samples/sec   Loss 9.5206   LearningRate 0.0481   Epoch: 6   Global Step: 76200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:05,795-Speed 2929.63 samples/sec   Loss 9.5741   LearningRate 0.0481   Epoch: 6   Global Step: 76210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:09,217-Speed 2994.48 samples/sec   Loss 9.4817   LearningRate 0.0480   Epoch: 6   Global Step: 76220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:12,670-Speed 2965.59 samples/sec   Loss 9.5384   LearningRate 0.0480   Epoch: 6   Global Step: 76230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:16,072-Speed 3011.48 samples/sec   Loss 9.6323   LearningRate 0.0480   Epoch: 6   Global Step: 76240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:19,513-Speed 2977.09 samples/sec   Loss 9.6076   LearningRate 0.0480   Epoch: 6   Global Step: 76250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:23,001-Speed 2936.62 samples/sec   Loss 9.5322   LearningRate 0.0480   Epoch: 6   Global Step: 76260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:26,370-Speed 3040.37 samples/sec   Loss 9.6041   LearningRate 0.0480   Epoch: 6   Global Step: 76270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:29,802-Speed 2984.16 samples/sec   Loss 9.6350   LearningRate 0.0480   Epoch: 6   Global Step: 76280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:33,223-Speed 2994.80 samples/sec   Loss 9.7607   LearningRate 0.0480   Epoch: 6   Global Step: 76290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:36,697-Speed 2948.43 samples/sec   Loss 9.5431   LearningRate 0.0480   Epoch: 6   Global Step: 76300   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:40,096-Speed 3013.27 samples/sec   Loss 9.6983   LearningRate 0.0480   Epoch: 6   Global Step: 76310   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:43,553-Speed 2962.95 samples/sec   Loss 9.5899   LearningRate 0.0480   Epoch: 6   Global Step: 76320   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:47,004-Speed 2968.83 samples/sec   Loss 9.5463   LearningRate 0.0480   Epoch: 6   Global Step: 76330   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:50,388-Speed 3026.72 samples/sec   Loss 9.6595   LearningRate 0.0480   Epoch: 6   Global Step: 76340   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:53,828-Speed 2977.23 samples/sec   Loss 9.8596   LearningRate 0.0480   Epoch: 6   Global Step: 76350   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:55:57,215-Speed 3024.18 samples/sec   Loss 9.6962   LearningRate 0.0480   Epoch: 6   Global Step: 76360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:00,621-Speed 3007.39 samples/sec   Loss 9.7333   LearningRate 0.0480   Epoch: 6   Global Step: 76370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:04,012-Speed 3021.17 samples/sec   Loss 9.6678   LearningRate 0.0480   Epoch: 6   Global Step: 76380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:07,454-Speed 2976.02 samples/sec   Loss 9.7074   LearningRate 0.0480   Epoch: 6   Global Step: 76390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:10,855-Speed 3011.10 samples/sec   Loss 9.6164   LearningRate 0.0479   Epoch: 6   Global Step: 76400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:14,248-Speed 3018.89 samples/sec   Loss 9.8359   LearningRate 0.0479   Epoch: 6   Global Step: 76410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:17,603-Speed 3053.53 samples/sec   Loss 9.6829   LearningRate 0.0479   Epoch: 6   Global Step: 76420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:20,965-Speed 3046.26 samples/sec   Loss 9.5953   LearningRate 0.0479   Epoch: 6   Global Step: 76430   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:24,301-Speed 3070.62 samples/sec   Loss 9.8254   LearningRate 0.0479   Epoch: 6   Global Step: 76440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:27,713-Speed 3002.20 samples/sec   Loss 9.6451   LearningRate 0.0479   Epoch: 6   Global Step: 76450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:31,118-Speed 3008.11 samples/sec   Loss 9.7508   LearningRate 0.0479   Epoch: 6   Global Step: 76460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:34,445-Speed 3078.72 samples/sec   Loss 9.7547   LearningRate 0.0479   Epoch: 6   Global Step: 76470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:37,923-Speed 2945.62 samples/sec   Loss 9.6173   LearningRate 0.0479   Epoch: 6   Global Step: 76480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:41,274-Speed 3056.78 samples/sec   Loss 9.6626   LearningRate 0.0479   Epoch: 6   Global Step: 76490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:44,714-Speed 2977.10 samples/sec   Loss 9.6271   LearningRate 0.0479   Epoch: 6   Global Step: 76500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:48,143-Speed 2987.19 samples/sec   Loss 9.7875   LearningRate 0.0479   Epoch: 6   Global Step: 76510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:51,502-Speed 3049.85 samples/sec   Loss 9.6441   LearningRate 0.0479   Epoch: 6   Global Step: 76520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:54,892-Speed 3021.55 samples/sec   Loss 9.6263   LearningRate 0.0479   Epoch: 6   Global Step: 76530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:56:58,265-Speed 3036.97 samples/sec   Loss 9.7661   LearningRate 0.0479   Epoch: 6   Global Step: 76540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:01,620-Speed 3053.03 samples/sec   Loss 9.7210   LearningRate 0.0479   Epoch: 6   Global Step: 76550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:05,019-Speed 3015.65 samples/sec   Loss 9.7477   LearningRate 0.0479   Epoch: 6   Global Step: 76560   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:08,534-Speed 2913.95 samples/sec   Loss 9.7983   LearningRate 0.0479   Epoch: 6   Global Step: 76570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:11,990-Speed 2964.24 samples/sec   Loss 9.7038   LearningRate 0.0478   Epoch: 6   Global Step: 76580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:15,365-Speed 3034.45 samples/sec   Loss 9.7060   LearningRate 0.0478   Epoch: 6   Global Step: 76590   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:18,708-Speed 3063.82 samples/sec   Loss 9.9089   LearningRate 0.0478   Epoch: 6   Global Step: 76600   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:22,083-Speed 3035.19 samples/sec   Loss 9.8386   LearningRate 0.0478   Epoch: 6   Global Step: 76610   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:25,510-Speed 2989.60 samples/sec   Loss 9.6677   LearningRate 0.0478   Epoch: 6   Global Step: 76620   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:28,874-Speed 3043.85 samples/sec   Loss 9.7126   LearningRate 0.0478   Epoch: 6   Global Step: 76630   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:32,214-Speed 3068.44 samples/sec   Loss 9.8517   LearningRate 0.0478   Epoch: 6   Global Step: 76640   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:35,694-Speed 2943.20 samples/sec   Loss 9.8351   LearningRate 0.0478   Epoch: 6   Global Step: 76650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:39,177-Speed 2940.92 samples/sec   Loss 9.7666   LearningRate 0.0478   Epoch: 6   Global Step: 76660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:42,625-Speed 2970.57 samples/sec   Loss 9.8165   LearningRate 0.0478   Epoch: 6   Global Step: 76670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:46,102-Speed 2946.10 samples/sec   Loss 9.5790   LearningRate 0.0478   Epoch: 6   Global Step: 76680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:49,473-Speed 3038.86 samples/sec   Loss 9.8062   LearningRate 0.0478   Epoch: 6   Global Step: 76690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:52,832-Speed 3048.93 samples/sec   Loss 9.6428   LearningRate 0.0478   Epoch: 6   Global Step: 76700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:56,214-Speed 3029.36 samples/sec   Loss 9.7794   LearningRate 0.0478   Epoch: 6   Global Step: 76710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:57:59,611-Speed 3015.48 samples/sec   Loss 9.7813   LearningRate 0.0478   Epoch: 6   Global Step: 76720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:02,990-Speed 3031.81 samples/sec   Loss 9.7568   LearningRate 0.0478   Epoch: 6   Global Step: 76730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:06,321-Speed 3074.83 samples/sec   Loss 9.7910   LearningRate 0.0478   Epoch: 6   Global Step: 76740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:09,745-Speed 2991.33 samples/sec   Loss 9.6689   LearningRate 0.0478   Epoch: 6   Global Step: 76750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:13,156-Speed 3002.76 samples/sec   Loss 9.7442   LearningRate 0.0477   Epoch: 6   Global Step: 76760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:16,522-Speed 3042.92 samples/sec   Loss 9.6425   LearningRate 0.0477   Epoch: 6   Global Step: 76770   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:19,890-Speed 3041.19 samples/sec   Loss 9.8334   LearningRate 0.0477   Epoch: 6   Global Step: 76780   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:23,296-Speed 3007.25 samples/sec   Loss 9.6262   LearningRate 0.0477   Epoch: 6   Global Step: 76790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:26,699-Speed 3010.23 samples/sec   Loss 9.6280   LearningRate 0.0477   Epoch: 6   Global Step: 76800   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:30,106-Speed 3006.66 samples/sec   Loss 9.6748   LearningRate 0.0477   Epoch: 6   Global Step: 76810   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:33,500-Speed 3017.62 samples/sec   Loss 9.7366   LearningRate 0.0477   Epoch: 6   Global Step: 76820   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:36,961-Speed 2961.04 samples/sec   Loss 9.7910   LearningRate 0.0477   Epoch: 6   Global Step: 76830   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:40,302-Speed 3065.80 samples/sec   Loss 9.6933   LearningRate 0.0477   Epoch: 6   Global Step: 76840   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:43,720-Speed 2996.40 samples/sec   Loss 9.6481   LearningRate 0.0477   Epoch: 6   Global Step: 76850   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:47,069-Speed 3058.58 samples/sec   Loss 9.8795   LearningRate 0.0477   Epoch: 6   Global Step: 76860   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:50,448-Speed 3031.89 samples/sec   Loss 9.7594   LearningRate 0.0477   Epoch: 6   Global Step: 76870   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:53,833-Speed 3025.53 samples/sec   Loss 9.8806   LearningRate 0.0477   Epoch: 6   Global Step: 76880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:58:57,204-Speed 3038.85 samples/sec   Loss 9.7449   LearningRate 0.0477   Epoch: 6   Global Step: 76890   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:00,561-Speed 3051.21 samples/sec   Loss 9.8068   LearningRate 0.0477   Epoch: 6   Global Step: 76900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:03,933-Speed 3037.65 samples/sec   Loss 9.7437   LearningRate 0.0477   Epoch: 6   Global Step: 76910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:07,316-Speed 3028.19 samples/sec   Loss 9.7208   LearningRate 0.0477   Epoch: 6   Global Step: 76920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:10,747-Speed 2985.47 samples/sec   Loss 9.8356   LearningRate 0.0477   Epoch: 6   Global Step: 76930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:14,122-Speed 3035.04 samples/sec   Loss 9.7875   LearningRate 0.0476   Epoch: 6   Global Step: 76940   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-04-27 08:59:17,478-Speed 3052.16 samples/sec   Loss 9.9216   LearningRate 0.0476   Epoch: 6   Global Step: 76950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:20,984-Speed 2921.12 samples/sec   Loss 9.6758   LearningRate 0.0476   Epoch: 6   Global Step: 76960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:24,420-Speed 2981.48 samples/sec   Loss 9.6826   LearningRate 0.0476   Epoch: 6   Global Step: 76970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:27,813-Speed 3018.83 samples/sec   Loss 9.9581   LearningRate 0.0476   Epoch: 6   Global Step: 76980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:31,152-Speed 3067.80 samples/sec   Loss 9.8006   LearningRate 0.0476   Epoch: 6   Global Step: 76990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:34,567-Speed 2999.12 samples/sec   Loss 9.8170   LearningRate 0.0476   Epoch: 6   Global Step: 77000   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:37,933-Speed 3043.34 samples/sec   Loss 9.7576   LearningRate 0.0476   Epoch: 6   Global Step: 77010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:41,271-Speed 3068.63 samples/sec   Loss 9.8400   LearningRate 0.0476   Epoch: 6   Global Step: 77020   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:44,679-Speed 3005.59 samples/sec   Loss 9.7107   LearningRate 0.0476   Epoch: 6   Global Step: 77030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:48,069-Speed 3021.89 samples/sec   Loss 9.7183   LearningRate 0.0476   Epoch: 6   Global Step: 77040   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:51,412-Speed 3064.01 samples/sec   Loss 9.8228   LearningRate 0.0476   Epoch: 6   Global Step: 77050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:54,769-Speed 3051.12 samples/sec   Loss 9.6919   LearningRate 0.0476   Epoch: 6   Global Step: 77060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 08:59:58,109-Speed 3067.16 samples/sec   Loss 9.9137   LearningRate 0.0476   Epoch: 6   Global Step: 77070   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:01,547-Speed 2978.81 samples/sec   Loss 9.7698   LearningRate 0.0476   Epoch: 6   Global Step: 77080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:04,969-Speed 2993.81 samples/sec   Loss 9.7853   LearningRate 0.0476   Epoch: 6   Global Step: 77090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:08,375-Speed 3006.99 samples/sec   Loss 9.8443   LearningRate 0.0476   Epoch: 6   Global Step: 77100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:11,820-Speed 2973.76 samples/sec   Loss 9.6998   LearningRate 0.0476   Epoch: 6   Global Step: 77110   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:15,149-Speed 3077.02 samples/sec   Loss 9.9969   LearningRate 0.0475   Epoch: 6   Global Step: 77120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:18,526-Speed 3032.85 samples/sec   Loss 9.8382   LearningRate 0.0475   Epoch: 6   Global Step: 77130   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:21,879-Speed 3055.17 samples/sec   Loss 9.9079   LearningRate 0.0475   Epoch: 6   Global Step: 77140   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:25,243-Speed 3045.07 samples/sec   Loss 9.8379   LearningRate 0.0475   Epoch: 6   Global Step: 77150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:28,629-Speed 3024.81 samples/sec   Loss 9.9720   LearningRate 0.0475   Epoch: 6   Global Step: 77160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:32,141-Speed 2916.88 samples/sec   Loss 9.8594   LearningRate 0.0475   Epoch: 6   Global Step: 77170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:35,529-Speed 3023.34 samples/sec   Loss 9.7779   LearningRate 0.0475   Epoch: 6   Global Step: 77180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:38,845-Speed 3089.53 samples/sec   Loss 9.8606   LearningRate 0.0475   Epoch: 6   Global Step: 77190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:42,254-Speed 3004.18 samples/sec   Loss 9.8193   LearningRate 0.0475   Epoch: 6   Global Step: 77200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:45,663-Speed 3005.34 samples/sec   Loss 9.8569   LearningRate 0.0475   Epoch: 6   Global Step: 77210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:49,029-Speed 3042.41 samples/sec   Loss 9.8716   LearningRate 0.0475   Epoch: 6   Global Step: 77220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:52,461-Speed 2984.88 samples/sec   Loss 9.8831   LearningRate 0.0475   Epoch: 6   Global Step: 77230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:55,846-Speed 3026.12 samples/sec   Loss 9.7007   LearningRate 0.0475   Epoch: 6   Global Step: 77240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:00:59,262-Speed 2998.54 samples/sec   Loss 9.7190   LearningRate 0.0475   Epoch: 6   Global Step: 77250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:02,590-Speed 3077.77 samples/sec   Loss 9.8217   LearningRate 0.0475   Epoch: 6   Global Step: 77260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:05,959-Speed 3040.16 samples/sec   Loss 9.9034   LearningRate 0.0475   Epoch: 6   Global Step: 77270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:09,309-Speed 3057.33 samples/sec   Loss 9.8669   LearningRate 0.0475   Epoch: 6   Global Step: 77280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:12,768-Speed 2962.17 samples/sec   Loss 9.7956   LearningRate 0.0475   Epoch: 6   Global Step: 77290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:16,123-Speed 3052.66 samples/sec   Loss 9.8467   LearningRate 0.0474   Epoch: 6   Global Step: 77300   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:19,470-Speed 3060.65 samples/sec   Loss 9.8088   LearningRate 0.0474   Epoch: 6   Global Step: 77310   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:22,817-Speed 3059.61 samples/sec   Loss 9.7237   LearningRate 0.0474   Epoch: 6   Global Step: 77320   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:26,202-Speed 3027.36 samples/sec   Loss 9.7330   LearningRate 0.0474   Epoch: 6   Global Step: 77330   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:29,542-Speed 3066.30 samples/sec   Loss 9.8755   LearningRate 0.0474   Epoch: 6   Global Step: 77340   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:32,881-Speed 3067.43 samples/sec   Loss 9.9166   LearningRate 0.0474   Epoch: 6   Global Step: 77350   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:36,191-Speed 3095.16 samples/sec   Loss 9.9106   LearningRate 0.0474   Epoch: 6   Global Step: 77360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:39,538-Speed 3060.44 samples/sec   Loss 9.7226   LearningRate 0.0474   Epoch: 6   Global Step: 77370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:42,906-Speed 3041.00 samples/sec   Loss 9.8110   LearningRate 0.0474   Epoch: 6   Global Step: 77380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:46,351-Speed 2973.19 samples/sec   Loss 9.8399   LearningRate 0.0474   Epoch: 6   Global Step: 77390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:49,714-Speed 3046.20 samples/sec   Loss 9.8450   LearningRate 0.0474   Epoch: 6   Global Step: 77400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:53,083-Speed 3040.45 samples/sec   Loss 10.0265   LearningRate 0.0474   Epoch: 6   Global Step: 77410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:56,478-Speed 3017.00 samples/sec   Loss 9.9053   LearningRate 0.0474   Epoch: 6   Global Step: 77420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:01:59,953-Speed 2947.54 samples/sec   Loss 9.8614   LearningRate 0.0474   Epoch: 6   Global Step: 77430   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:03,378-Speed 2991.17 samples/sec   Loss 9.7269   LearningRate 0.0474   Epoch: 6   Global Step: 77440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:06,719-Speed 3065.79 samples/sec   Loss 9.8659   LearningRate 0.0474   Epoch: 6   Global Step: 77450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:10,130-Speed 3002.79 samples/sec   Loss 10.0788   LearningRate 0.0474   Epoch: 6   Global Step: 77460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:13,651-Speed 2908.62 samples/sec   Loss 9.7952   LearningRate 0.0474   Epoch: 6   Global Step: 77470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:17,071-Speed 2994.78 samples/sec   Loss 9.7468   LearningRate 0.0473   Epoch: 6   Global Step: 77480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:20,504-Speed 2984.00 samples/sec   Loss 10.0211   LearningRate 0.0473   Epoch: 6   Global Step: 77490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:23,967-Speed 2957.74 samples/sec   Loss 9.8161   LearningRate 0.0473   Epoch: 6   Global Step: 77500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:27,439-Speed 2950.83 samples/sec   Loss 9.8479   LearningRate 0.0473   Epoch: 6   Global Step: 77510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:30,906-Speed 2953.66 samples/sec   Loss 10.0607   LearningRate 0.0473   Epoch: 6   Global Step: 77520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:34,380-Speed 2949.18 samples/sec   Loss 9.9336   LearningRate 0.0473   Epoch: 6   Global Step: 77530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:37,816-Speed 2980.76 samples/sec   Loss 9.8228   LearningRate 0.0473   Epoch: 6   Global Step: 77540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:41,176-Speed 3048.94 samples/sec   Loss 10.0104   LearningRate 0.0473   Epoch: 6   Global Step: 77550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:44,531-Speed 3053.21 samples/sec   Loss 9.8831   LearningRate 0.0473   Epoch: 6   Global Step: 77560   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:47,866-Speed 3070.81 samples/sec   Loss 9.6824   LearningRate 0.0473   Epoch: 6   Global Step: 77570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:51,359-Speed 2932.46 samples/sec   Loss 9.7911   LearningRate 0.0473   Epoch: 6   Global Step: 77580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:02:54,691-Speed 3074.51 samples/sec   Loss 9.7929   LearningRate 0.0473   Epoch: 6   Global Step: 77590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:02:58,092-Speed 3011.56 samples/sec   Loss 9.7632   LearningRate 0.0473   Epoch: 6   Global Step: 77600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:03:01,472-Speed 3030.62 samples/sec   Loss 9.9001   LearningRate 0.0473   Epoch: 6   Global Step: 77610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:03:04,866-Speed 3017.67 samples/sec   Loss 9.9470   LearningRate 0.0473   Epoch: 6   Global Step: 77620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:03:08,273-Speed 3006.22 samples/sec   Loss 9.9889   LearningRate 0.0473   Epoch: 6   Global Step: 77630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:03:11,676-Speed 3010.17 samples/sec   Loss 9.9535   LearningRate 0.0473   Epoch: 6   Global Step: 77640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:03:15,119-Speed 2975.17 samples/sec   Loss 9.7665   LearningRate 0.0473   Epoch: 6   Global Step: 77650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:03:18,514-Speed 3017.44 samples/sec   Loss 9.9160   LearningRate 0.0472   Epoch: 6   Global Step: 77660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:03:21,960-Speed 2971.85 samples/sec   Loss 9.8505   LearningRate 0.0472   Epoch: 6   Global Step: 77670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:03:25,309-Speed 3059.13 samples/sec   Loss 9.7716   LearningRate 0.0472   Epoch: 6   Global Step: 77680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:03:28,627-Speed 3086.80 samples/sec   Loss 9.9210   LearningRate 0.0472   Epoch: 6   Global Step: 77690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:03:32,013-Speed 3025.04 samples/sec   Loss 9.8040   LearningRate 0.0472   Epoch: 6   Global Step: 77700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:03:35,388-Speed 3035.56 samples/sec   Loss 9.9155   LearningRate 0.0472   Epoch: 6   Global Step: 77710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:03:38,820-Speed 2984.21 samples/sec   Loss 9.8581   LearningRate 0.0472   Epoch: 6   Global Step: 77720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:03:42,209-Speed 3022.86 samples/sec   Loss 9.7082   LearningRate 0.0472   Epoch: 6   Global Step: 77730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:03:45,635-Speed 2990.44 samples/sec   Loss 9.9470   LearningRate 0.0472   Epoch: 6   Global Step: 77740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:03:49,037-Speed 3011.04 samples/sec   Loss 9.9775   LearningRate 0.0472   Epoch: 6   Global Step: 77750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:03:52,394-Speed 3051.58 samples/sec   Loss 10.0457   LearningRate 0.0472   Epoch: 6   Global Step: 77760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:03:55,742-Speed 3059.98 samples/sec   Loss 9.9426   LearningRate 0.0472   Epoch: 6   Global Step: 77770   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:03:59,120-Speed 3031.98 samples/sec   Loss 10.0268   LearningRate 0.0472   Epoch: 6   Global Step: 77780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:04:02,552-Speed 2984.37 samples/sec   Loss 9.9188   LearningRate 0.0472   Epoch: 6   Global Step: 77790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:04:06,033-Speed 2943.24 samples/sec   Loss 9.8458   LearningRate 0.0472   Epoch: 6   Global Step: 77800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:04:09,499-Speed 2954.66 samples/sec   Loss 9.8542   LearningRate 0.0472   Epoch: 6   Global Step: 77810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:04:12,954-Speed 2964.84 samples/sec   Loss 9.9601   LearningRate 0.0472   Epoch: 6   Global Step: 77820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:04:16,342-Speed 3024.47 samples/sec   Loss 9.8400   LearningRate 0.0472   Epoch: 6   Global Step: 77830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:04:19,806-Speed 2956.73 samples/sec   Loss 10.0471   LearningRate 0.0472   Epoch: 6   Global Step: 77840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:04:23,249-Speed 2975.74 samples/sec   Loss 9.7947   LearningRate 0.0471   Epoch: 6   Global Step: 77850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:04:26,692-Speed 2974.93 samples/sec   Loss 9.8066   LearningRate 0.0471   Epoch: 6   Global Step: 77860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:04:30,052-Speed 3048.37 samples/sec   Loss 9.7963   LearningRate 0.0471   Epoch: 6   Global Step: 77870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:04:33,442-Speed 3021.25 samples/sec   Loss 9.9148   LearningRate 0.0471   Epoch: 6   Global Step: 77880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:04:36,948-Speed 2921.91 samples/sec   Loss 10.0421   LearningRate 0.0471   Epoch: 6   Global Step: 77890   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:04:40,307-Speed 3049.69 samples/sec   Loss 9.8903   LearningRate 0.0471   Epoch: 6   Global Step: 77900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:04:43,692-Speed 3025.50 samples/sec   Loss 9.8470   LearningRate 0.0471   Epoch: 6   Global Step: 77910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:04:47,215-Speed 2907.80 samples/sec   Loss 9.9131   LearningRate 0.0471   Epoch: 6   Global Step: 77920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:04:50,612-Speed 3015.80 samples/sec   Loss 9.8034   LearningRate 0.0471   Epoch: 6   Global Step: 77930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:04:54,014-Speed 3010.96 samples/sec   Loss 9.8795   LearningRate 0.0471   Epoch: 6   Global Step: 77940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:04:57,443-Speed 2987.19 samples/sec   Loss 10.0352   LearningRate 0.0471   Epoch: 6   Global Step: 77950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:05:00,840-Speed 3015.56 samples/sec   Loss 9.8272   LearningRate 0.0471   Epoch: 6   Global Step: 77960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:05:04,264-Speed 2990.93 samples/sec   Loss 9.7919   LearningRate 0.0471   Epoch: 6   Global Step: 77970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:07,710-Speed 2972.36 samples/sec   Loss 10.0617   LearningRate 0.0471   Epoch: 6   Global Step: 77980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:11,240-Speed 2901.76 samples/sec   Loss 9.9047   LearningRate 0.0471   Epoch: 6   Global Step: 77990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:14,665-Speed 2990.63 samples/sec   Loss 9.8586   LearningRate 0.0471   Epoch: 6   Global Step: 78000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:18,092-Speed 2989.48 samples/sec   Loss 9.8672   LearningRate 0.0471   Epoch: 6   Global Step: 78010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:21,438-Speed 3061.14 samples/sec   Loss 9.8918   LearningRate 0.0471   Epoch: 6   Global Step: 78020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:24,807-Speed 3040.61 samples/sec   Loss 9.9822   LearningRate 0.0470   Epoch: 6   Global Step: 78030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:28,210-Speed 3009.87 samples/sec   Loss 9.9160   LearningRate 0.0470   Epoch: 6   Global Step: 78040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:31,606-Speed 3016.40 samples/sec   Loss 10.0069   LearningRate 0.0470   Epoch: 6   Global Step: 78050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:34,992-Speed 3024.84 samples/sec   Loss 9.8208   LearningRate 0.0470   Epoch: 6   Global Step: 78060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:38,375-Speed 3027.39 samples/sec   Loss 10.0033   LearningRate 0.0470   Epoch: 6   Global Step: 78070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:41,839-Speed 2957.62 samples/sec   Loss 9.7872   LearningRate 0.0470   Epoch: 6   Global Step: 78080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:45,197-Speed 3049.79 samples/sec   Loss 9.7035   LearningRate 0.0470   Epoch: 6   Global Step: 78090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:48,655-Speed 2961.96 samples/sec   Loss 9.9026   LearningRate 0.0470   Epoch: 6   Global Step: 78100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:52,042-Speed 3024.40 samples/sec   Loss 9.8457   LearningRate 0.0470   Epoch: 6   Global Step: 78110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:55,400-Speed 3050.33 samples/sec   Loss 9.9822   LearningRate 0.0470   Epoch: 6   Global Step: 78120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:05:58,756-Speed 3052.75 samples/sec   Loss 9.9453   LearningRate 0.0470   Epoch: 6   Global Step: 78130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:06:02,191-Speed 2981.15 samples/sec   Loss 9.9767   LearningRate 0.0470   Epoch: 6   Global Step: 78140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:06:05,535-Speed 3063.63 samples/sec   Loss 9.7857   LearningRate 0.0470   Epoch: 6   Global Step: 78150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:06:08,989-Speed 2965.40 samples/sec   Loss 9.9126   LearningRate 0.0470   Epoch: 6   Global Step: 78160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:06:12,379-Speed 3021.85 samples/sec   Loss 9.9509   LearningRate 0.0470   Epoch: 6   Global Step: 78170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:06:15,878-Speed 2927.45 samples/sec   Loss 10.0139   LearningRate 0.0470   Epoch: 6   Global Step: 78180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:06:19,299-Speed 2994.12 samples/sec   Loss 10.0646   LearningRate 0.0470   Epoch: 6   Global Step: 78190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:06:22,694-Speed 3016.94 samples/sec   Loss 9.9149   LearningRate 0.0470   Epoch: 6   Global Step: 78200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:06:26,178-Speed 2940.30 samples/sec   Loss 9.9819   LearningRate 0.0469   Epoch: 6   Global Step: 78210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:06:29,518-Speed 3066.61 samples/sec   Loss 9.8890   LearningRate 0.0469   Epoch: 6   Global Step: 78220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:06:32,937-Speed 2995.69 samples/sec   Loss 9.9682   LearningRate 0.0469   Epoch: 6   Global Step: 78230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:06:36,370-Speed 2983.63 samples/sec   Loss 9.9433   LearningRate 0.0469   Epoch: 6   Global Step: 78240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:06:39,836-Speed 2955.16 samples/sec   Loss 9.8250   LearningRate 0.0469   Epoch: 6   Global Step: 78250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:06:43,162-Speed 3079.48 samples/sec   Loss 9.9014   LearningRate 0.0469   Epoch: 6   Global Step: 78260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:06:46,474-Speed 3093.47 samples/sec   Loss 9.7394   LearningRate 0.0469   Epoch: 6   Global Step: 78270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:06:49,877-Speed 3009.32 samples/sec   Loss 10.0546   LearningRate 0.0469   Epoch: 6   Global Step: 78280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:06:53,222-Speed 3062.84 samples/sec   Loss 9.9108   LearningRate 0.0469   Epoch: 6   Global Step: 78290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:06:56,585-Speed 3045.55 samples/sec   Loss 9.9855   LearningRate 0.0469   Epoch: 6   Global Step: 78300   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:06:59,933-Speed 3059.71 samples/sec   Loss 10.0335   LearningRate 0.0469   Epoch: 6   Global Step: 78310   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:03,337-Speed 3009.28 samples/sec   Loss 9.9537   LearningRate 0.0469   Epoch: 6   Global Step: 78320   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:06,710-Speed 3037.41 samples/sec   Loss 10.0034   LearningRate 0.0469   Epoch: 6   Global Step: 78330   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:10,061-Speed 3057.09 samples/sec   Loss 9.8992   LearningRate 0.0469   Epoch: 6   Global Step: 78340   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:13,420-Speed 3049.11 samples/sec   Loss 10.0294   LearningRate 0.0469   Epoch: 6   Global Step: 78350   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:16,788-Speed 3041.65 samples/sec   Loss 9.9021   LearningRate 0.0469   Epoch: 6   Global Step: 78360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:20,107-Speed 3086.56 samples/sec   Loss 9.7775   LearningRate 0.0469   Epoch: 6   Global Step: 78370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:23,515-Speed 3005.59 samples/sec   Loss 9.9037   LearningRate 0.0469   Epoch: 6   Global Step: 78380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:26,915-Speed 3012.36 samples/sec   Loss 9.9656   LearningRate 0.0468   Epoch: 6   Global Step: 78390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:30,279-Speed 3044.91 samples/sec   Loss 9.7721   LearningRate 0.0468   Epoch: 6   Global Step: 78400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:33,717-Speed 2979.19 samples/sec   Loss 9.8428   LearningRate 0.0468   Epoch: 6   Global Step: 78410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:37,132-Speed 2999.98 samples/sec   Loss 9.9329   LearningRate 0.0468   Epoch: 6   Global Step: 78420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:40,555-Speed 2992.06 samples/sec   Loss 9.8981   LearningRate 0.0468   Epoch: 6   Global Step: 78430   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:43,956-Speed 3011.68 samples/sec   Loss 10.0055   LearningRate 0.0468   Epoch: 6   Global Step: 78440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:47,316-Speed 3048.44 samples/sec   Loss 9.9664   LearningRate 0.0468   Epoch: 6   Global Step: 78450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:50,723-Speed 3007.21 samples/sec   Loss 9.9233   LearningRate 0.0468   Epoch: 6   Global Step: 78460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:54,089-Speed 3043.01 samples/sec   Loss 9.9141   LearningRate 0.0468   Epoch: 6   Global Step: 78470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:07:57,487-Speed 3014.08 samples/sec   Loss 9.9739   LearningRate 0.0468   Epoch: 6   Global Step: 78480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:08:00,846-Speed 3050.33 samples/sec   Loss 10.0895   LearningRate 0.0468   Epoch: 6   Global Step: 78490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:08:04,257-Speed 3002.42 samples/sec   Loss 9.8743   LearningRate 0.0468   Epoch: 6   Global Step: 78500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:08:07,664-Speed 3006.59 samples/sec   Loss 9.9317   LearningRate 0.0468   Epoch: 6   Global Step: 78510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:08:11,075-Speed 3003.62 samples/sec   Loss 9.9453   LearningRate 0.0468   Epoch: 6   Global Step: 78520   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:08:14,467-Speed 3019.21 samples/sec   Loss 10.0000   LearningRate 0.0468   Epoch: 6   Global Step: 78530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:08:17,876-Speed 3005.20 samples/sec   Loss 9.7598   LearningRate 0.0468   Epoch: 6   Global Step: 78540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:08:21,270-Speed 3018.44 samples/sec   Loss 9.9614   LearningRate 0.0468   Epoch: 6   Global Step: 78550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:08:24,681-Speed 3002.92 samples/sec   Loss 9.8951   LearningRate 0.0468   Epoch: 6   Global Step: 78560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:08:28,071-Speed 3021.04 samples/sec   Loss 9.8619   LearningRate 0.0467   Epoch: 6   Global Step: 78570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:08:31,418-Speed 3060.66 samples/sec   Loss 9.7884   LearningRate 0.0467   Epoch: 6   Global Step: 78580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:08:34,875-Speed 2962.61 samples/sec   Loss 9.8880   LearningRate 0.0467   Epoch: 6   Global Step: 78590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:08:38,214-Speed 3068.33 samples/sec   Loss 9.8499   LearningRate 0.0467   Epoch: 6   Global Step: 78600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:08:41,639-Speed 2990.48 samples/sec   Loss 9.8695   LearningRate 0.0467   Epoch: 6   Global Step: 78610   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:08:44,996-Speed 3050.65 samples/sec   Loss 10.0130   LearningRate 0.0467   Epoch: 6   Global Step: 78620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:08:48,401-Speed 3008.90 samples/sec   Loss 9.8907   LearningRate 0.0467   Epoch: 6   Global Step: 78630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:08:51,808-Speed 3006.22 samples/sec   Loss 9.8947   LearningRate 0.0467   Epoch: 6   Global Step: 78640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:08:55,218-Speed 3004.52 samples/sec   Loss 9.8517   LearningRate 0.0467   Epoch: 6   Global Step: 78650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:08:58,601-Speed 3027.54 samples/sec   Loss 9.8492   LearningRate 0.0467   Epoch: 6   Global Step: 78660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:09:01,971-Speed 3040.11 samples/sec   Loss 9.7967   LearningRate 0.0467   Epoch: 6   Global Step: 78670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:09:05,366-Speed 3016.67 samples/sec   Loss 9.9256   LearningRate 0.0467   Epoch: 6   Global Step: 78680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:09:08,753-Speed 3024.84 samples/sec   Loss 9.8752   LearningRate 0.0467   Epoch: 6   Global Step: 78690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:09:12,104-Speed 3056.15 samples/sec   Loss 9.7285   LearningRate 0.0467   Epoch: 6   Global Step: 78700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:09:15,571-Speed 2954.75 samples/sec   Loss 10.0214   LearningRate 0.0467   Epoch: 6   Global Step: 78710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:09:19,001-Speed 2986.21 samples/sec   Loss 10.0130   LearningRate 0.0467   Epoch: 6   Global Step: 78720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:09:22,473-Speed 2950.50 samples/sec   Loss 9.9101   LearningRate 0.0467   Epoch: 6   Global Step: 78730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:09:25,858-Speed 3025.47 samples/sec   Loss 9.7900   LearningRate 0.0467   Epoch: 6   Global Step: 78740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:09:29,259-Speed 3012.17 samples/sec   Loss 10.0264   LearningRate 0.0466   Epoch: 6   Global Step: 78750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:09:32,653-Speed 3017.49 samples/sec   Loss 9.8943   LearningRate 0.0466   Epoch: 6   Global Step: 78760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:09:36,180-Speed 2904.44 samples/sec   Loss 10.0131   LearningRate 0.0466   Epoch: 6   Global Step: 78770   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:09:39,528-Speed 3059.75 samples/sec   Loss 10.0115   LearningRate 0.0466   Epoch: 6   Global Step: 78780   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:09:43,030-Speed 2924.21 samples/sec   Loss 10.0714   LearningRate 0.0466   Epoch: 6   Global Step: 78790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:09:46,449-Speed 2996.01 samples/sec   Loss 9.9299   LearningRate 0.0466   Epoch: 6   Global Step: 78800   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:09:49,846-Speed 3015.50 samples/sec   Loss 9.9401   LearningRate 0.0466   Epoch: 6   Global Step: 78810   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:09:53,159-Speed 3092.26 samples/sec   Loss 9.9127   LearningRate 0.0466   Epoch: 6   Global Step: 78820   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:09:56,511-Speed 3055.39 samples/sec   Loss 10.0607   LearningRate 0.0466   Epoch: 6   Global Step: 78830   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:09:59,858-Speed 3060.05 samples/sec   Loss 10.0644   LearningRate 0.0466   Epoch: 6   Global Step: 78840   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:10:03,241-Speed 3028.01 samples/sec   Loss 10.0181   LearningRate 0.0466   Epoch: 6   Global Step: 78850   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:10:06,646-Speed 3007.94 samples/sec   Loss 9.9421   LearningRate 0.0466   Epoch: 6   Global Step: 78860   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:10:10,085-Speed 2978.23 samples/sec   Loss 9.9299   LearningRate 0.0466   Epoch: 6   Global Step: 78870   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:10:13,460-Speed 3035.10 samples/sec   Loss 10.0356   LearningRate 0.0466   Epoch: 6   Global Step: 78880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:10:16,902-Speed 2976.49 samples/sec   Loss 9.9891   LearningRate 0.0466   Epoch: 6   Global Step: 78890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:10:20,341-Speed 2978.41 samples/sec   Loss 9.8531   LearningRate 0.0466   Epoch: 6   Global Step: 78900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:10:23,672-Speed 3074.84 samples/sec   Loss 9.8107   LearningRate 0.0466   Epoch: 6   Global Step: 78910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:10:27,121-Speed 2970.77 samples/sec   Loss 9.9356   LearningRate 0.0466   Epoch: 6   Global Step: 78920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:10:30,496-Speed 3034.17 samples/sec   Loss 9.8883   LearningRate 0.0465   Epoch: 6   Global Step: 78930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:10:33,943-Speed 2972.30 samples/sec   Loss 9.9294   LearningRate 0.0465   Epoch: 6   Global Step: 78940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:10:37,335-Speed 3019.04 samples/sec   Loss 9.8915   LearningRate 0.0465   Epoch: 6   Global Step: 78950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:10:40,695-Speed 3049.21 samples/sec   Loss 9.9803   LearningRate 0.0465   Epoch: 6   Global Step: 78960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:10:44,150-Speed 2964.74 samples/sec   Loss 9.9703   LearningRate 0.0465   Epoch: 6   Global Step: 78970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:10:47,554-Speed 3008.70 samples/sec   Loss 9.9158   LearningRate 0.0465   Epoch: 6   Global Step: 78980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:10:50,910-Speed 3052.53 samples/sec   Loss 10.1433   LearningRate 0.0465   Epoch: 6   Global Step: 78990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:10:54,282-Speed 3037.44 samples/sec   Loss 9.7681   LearningRate 0.0465   Epoch: 6   Global Step: 79000   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:10:57,662-Speed 3030.68 samples/sec   Loss 9.8737   LearningRate 0.0465   Epoch: 6   Global Step: 79010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:01,082-Speed 2995.24 samples/sec   Loss 9.9449   LearningRate 0.0465   Epoch: 6   Global Step: 79020   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:04,528-Speed 2972.43 samples/sec   Loss 10.0276   LearningRate 0.0465   Epoch: 6   Global Step: 79030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:07,965-Speed 2979.99 samples/sec   Loss 9.7789   LearningRate 0.0465   Epoch: 6   Global Step: 79040   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:11,421-Speed 2964.20 samples/sec   Loss 10.0269   LearningRate 0.0465   Epoch: 6   Global Step: 79050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:14,758-Speed 3068.87 samples/sec   Loss 9.9623   LearningRate 0.0465   Epoch: 6   Global Step: 79060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:18,153-Speed 3017.66 samples/sec   Loss 9.8099   LearningRate 0.0465   Epoch: 6   Global Step: 79070   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:21,555-Speed 3011.04 samples/sec   Loss 9.8793   LearningRate 0.0465   Epoch: 6   Global Step: 79080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:24,875-Speed 3085.00 samples/sec   Loss 9.9589   LearningRate 0.0465   Epoch: 6   Global Step: 79090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:28,191-Speed 3088.64 samples/sec   Loss 10.0547   LearningRate 0.0465   Epoch: 6   Global Step: 79100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:31,638-Speed 2972.30 samples/sec   Loss 10.0314   LearningRate 0.0465   Epoch: 6   Global Step: 79110   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:35,038-Speed 3013.05 samples/sec   Loss 9.9004   LearningRate 0.0464   Epoch: 6   Global Step: 79120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:38,406-Speed 3040.66 samples/sec   Loss 10.0068   LearningRate 0.0464   Epoch: 6   Global Step: 79130   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:41,857-Speed 2968.27 samples/sec   Loss 9.9768   LearningRate 0.0464   Epoch: 6   Global Step: 79140   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:45,240-Speed 3027.68 samples/sec   Loss 9.8919   LearningRate 0.0464   Epoch: 6   Global Step: 79150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:48,636-Speed 3017.44 samples/sec   Loss 10.1381   LearningRate 0.0464   Epoch: 6   Global Step: 79160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:52,059-Speed 2992.64 samples/sec   Loss 9.9694   LearningRate 0.0464   Epoch: 6   Global Step: 79170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:55,445-Speed 3024.48 samples/sec   Loss 9.9636   LearningRate 0.0464   Epoch: 6   Global Step: 79180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:11:58,841-Speed 3016.00 samples/sec   Loss 9.7875   LearningRate 0.0464   Epoch: 6   Global Step: 79190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:12:02,197-Speed 3052.03 samples/sec   Loss 9.9042   LearningRate 0.0464   Epoch: 6   Global Step: 79200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:12:05,627-Speed 2986.53 samples/sec   Loss 9.8724   LearningRate 0.0464   Epoch: 6   Global Step: 79210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:12:09,101-Speed 2948.06 samples/sec   Loss 9.8962   LearningRate 0.0464   Epoch: 6   Global Step: 79220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:12:12,561-Speed 2960.27 samples/sec   Loss 9.8178   LearningRate 0.0464   Epoch: 6   Global Step: 79230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:12:15,970-Speed 3004.60 samples/sec   Loss 9.9858   LearningRate 0.0464   Epoch: 6   Global Step: 79240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:12:19,339-Speed 3040.99 samples/sec   Loss 9.8745   LearningRate 0.0464   Epoch: 6   Global Step: 79250   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:12:22,737-Speed 3014.42 samples/sec   Loss 10.0019   LearningRate 0.0464   Epoch: 6   Global Step: 79260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:12:26,156-Speed 2995.52 samples/sec   Loss 10.0250   LearningRate 0.0464   Epoch: 6   Global Step: 79270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:12:29,540-Speed 3027.43 samples/sec   Loss 10.0698   LearningRate 0.0464   Epoch: 6   Global Step: 79280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:12:32,961-Speed 2993.98 samples/sec   Loss 9.8625   LearningRate 0.0464   Epoch: 6   Global Step: 79290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:12:36,397-Speed 2980.79 samples/sec   Loss 9.9015   LearningRate 0.0463   Epoch: 6   Global Step: 79300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:12:39,828-Speed 2986.05 samples/sec   Loss 9.9134   LearningRate 0.0463   Epoch: 6   Global Step: 79310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:12:43,301-Speed 2949.84 samples/sec   Loss 9.7248   LearningRate 0.0463   Epoch: 6   Global Step: 79320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:12:46,664-Speed 3045.44 samples/sec   Loss 9.9027   LearningRate 0.0463   Epoch: 6   Global Step: 79330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:12:50,096-Speed 2985.24 samples/sec   Loss 9.8904   LearningRate 0.0463   Epoch: 6   Global Step: 79340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:12:53,517-Speed 2993.40 samples/sec   Loss 9.8894   LearningRate 0.0463   Epoch: 6   Global Step: 79350   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:12:56,865-Speed 3059.71 samples/sec   Loss 9.9071   LearningRate 0.0463   Epoch: 6   Global Step: 79360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:13:00,352-Speed 2938.05 samples/sec   Loss 10.0375   LearningRate 0.0463   Epoch: 6   Global Step: 79370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:13:03,684-Speed 3073.34 samples/sec   Loss 9.9229   LearningRate 0.0463   Epoch: 6   Global Step: 79380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:13:07,113-Speed 2987.75 samples/sec   Loss 9.8814   LearningRate 0.0463   Epoch: 6   Global Step: 79390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:13:10,489-Speed 3035.05 samples/sec   Loss 10.0321   LearningRate 0.0463   Epoch: 6   Global Step: 79400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:13:13,934-Speed 2972.87 samples/sec   Loss 9.8075   LearningRate 0.0463   Epoch: 6   Global Step: 79410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:13:17,355-Speed 2994.04 samples/sec   Loss 9.9054   LearningRate 0.0463   Epoch: 6   Global Step: 79420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:13:20,739-Speed 3027.49 samples/sec   Loss 9.9772   LearningRate 0.0463   Epoch: 6   Global Step: 79430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:13:24,196-Speed 2963.08 samples/sec   Loss 9.7782   LearningRate 0.0463   Epoch: 6   Global Step: 79440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:13:27,576-Speed 3030.82 samples/sec   Loss 10.0750   LearningRate 0.0463   Epoch: 6   Global Step: 79450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:13:30,992-Speed 2998.18 samples/sec   Loss 9.9632   LearningRate 0.0463   Epoch: 6   Global Step: 79460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:13:34,397-Speed 3008.59 samples/sec   Loss 10.0766   LearningRate 0.0463   Epoch: 6   Global Step: 79470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:13:37,836-Speed 2978.65 samples/sec   Loss 9.9230   LearningRate 0.0462   Epoch: 6   Global Step: 79480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:13:41,267-Speed 2985.01 samples/sec   Loss 9.8539   LearningRate 0.0462   Epoch: 6   Global Step: 79490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:13:44,697-Speed 2986.78 samples/sec   Loss 9.8874   LearningRate 0.0462   Epoch: 6   Global Step: 79500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:13:48,141-Speed 2973.80 samples/sec   Loss 9.9942   LearningRate 0.0462   Epoch: 6   Global Step: 79510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:13:51,474-Speed 3073.33 samples/sec   Loss 9.6469   LearningRate 0.0462   Epoch: 6   Global Step: 79520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:13:54,937-Speed 2958.46 samples/sec   Loss 9.8867   LearningRate 0.0462   Epoch: 6   Global Step: 79530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:13:58,313-Speed 3033.53 samples/sec   Loss 9.8713   LearningRate 0.0462   Epoch: 6   Global Step: 79540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:14:01,732-Speed 2995.95 samples/sec   Loss 9.9515   LearningRate 0.0462   Epoch: 6   Global Step: 79550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:14:05,191-Speed 2961.35 samples/sec   Loss 9.8585   LearningRate 0.0462   Epoch: 6   Global Step: 79560   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:14:08,521-Speed 3075.75 samples/sec   Loss 9.8730   LearningRate 0.0462   Epoch: 6   Global Step: 79570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:14:11,882-Speed 3047.70 samples/sec   Loss 10.0942   LearningRate 0.0462   Epoch: 6   Global Step: 79580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:14:15,254-Speed 3037.44 samples/sec   Loss 9.8733   LearningRate 0.0462   Epoch: 6   Global Step: 79590   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:14:18,628-Speed 3036.32 samples/sec   Loss 9.9007   LearningRate 0.0462   Epoch: 6   Global Step: 79600   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:14:22,005-Speed 3032.81 samples/sec   Loss 9.8307   LearningRate 0.0462   Epoch: 6   Global Step: 79610   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:14:25,378-Speed 3036.84 samples/sec   Loss 9.9275   LearningRate 0.0462   Epoch: 6   Global Step: 79620   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:14:28,727-Speed 3059.11 samples/sec   Loss 9.8582   LearningRate 0.0462   Epoch: 6   Global Step: 79630   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:14:32,038-Speed 3093.42 samples/sec   Loss 9.7542   LearningRate 0.0462   Epoch: 6   Global Step: 79640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:14:35,457-Speed 2995.02 samples/sec   Loss 9.9006   LearningRate 0.0462   Epoch: 6   Global Step: 79650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:14:38,844-Speed 3024.58 samples/sec   Loss 9.7628   LearningRate 0.0461   Epoch: 6   Global Step: 79660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:14:42,226-Speed 3028.61 samples/sec   Loss 10.0072   LearningRate 0.0461   Epoch: 6   Global Step: 79670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:14:45,579-Speed 3055.06 samples/sec   Loss 9.9707   LearningRate 0.0461   Epoch: 6   Global Step: 79680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:14:48,983-Speed 3009.43 samples/sec   Loss 9.9173   LearningRate 0.0461   Epoch: 6   Global Step: 79690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:14:52,376-Speed 3019.16 samples/sec   Loss 9.9658   LearningRate 0.0461   Epoch: 6   Global Step: 79700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:14:55,880-Speed 2922.95 samples/sec   Loss 9.9208   LearningRate 0.0461   Epoch: 6   Global Step: 79710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:14:59,223-Speed 3064.35 samples/sec   Loss 9.8116   LearningRate 0.0461   Epoch: 6   Global Step: 79720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:15:02,645-Speed 2993.55 samples/sec   Loss 9.9542   LearningRate 0.0461   Epoch: 6   Global Step: 79730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:15:06,048-Speed 3009.71 samples/sec   Loss 9.8271   LearningRate 0.0461   Epoch: 6   Global Step: 79740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:15:09,412-Speed 3045.02 samples/sec   Loss 9.8479   LearningRate 0.0461   Epoch: 6   Global Step: 79750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:15:12,789-Speed 3033.12 samples/sec   Loss 9.9795   LearningRate 0.0461   Epoch: 6   Global Step: 79760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:15:16,129-Speed 3066.96 samples/sec   Loss 10.0176   LearningRate 0.0461   Epoch: 6   Global Step: 79770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:15:19,482-Speed 3054.32 samples/sec   Loss 9.9552   LearningRate 0.0461   Epoch: 6   Global Step: 79780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:15:22,943-Speed 2960.68 samples/sec   Loss 10.0252   LearningRate 0.0461   Epoch: 6   Global Step: 79790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:15:26,377-Speed 2982.38 samples/sec   Loss 9.9071   LearningRate 0.0461   Epoch: 6   Global Step: 79800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:15:29,754-Speed 3033.78 samples/sec   Loss 9.7359   LearningRate 0.0461   Epoch: 6   Global Step: 79810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:15:33,171-Speed 2997.02 samples/sec   Loss 9.9476   LearningRate 0.0461   Epoch: 6   Global Step: 79820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:15:36,500-Speed 3077.25 samples/sec   Loss 9.8493   LearningRate 0.0461   Epoch: 6   Global Step: 79830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:15:39,818-Speed 3086.80 samples/sec   Loss 9.8443   LearningRate 0.0461   Epoch: 6   Global Step: 79840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:15:43,253-Speed 2982.19 samples/sec   Loss 9.8066   LearningRate 0.0460   Epoch: 6   Global Step: 79850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:15:46,654-Speed 3011.92 samples/sec   Loss 9.7738   LearningRate 0.0460   Epoch: 6   Global Step: 79860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:15:49,994-Speed 3067.28 samples/sec   Loss 9.8857   LearningRate 0.0460   Epoch: 6   Global Step: 79870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:15:53,360-Speed 3042.98 samples/sec   Loss 10.0170   LearningRate 0.0460   Epoch: 6   Global Step: 79880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:15:56,803-Speed 2975.10 samples/sec   Loss 9.7927   LearningRate 0.0460   Epoch: 6   Global Step: 79890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:16:00,209-Speed 3008.00 samples/sec   Loss 9.8631   LearningRate 0.0460   Epoch: 6   Global Step: 79900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:16:03,544-Speed 3071.45 samples/sec   Loss 9.7649   LearningRate 0.0460   Epoch: 6   Global Step: 79910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:16:07,001-Speed 2962.23 samples/sec   Loss 9.8076   LearningRate 0.0460   Epoch: 6   Global Step: 79920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:16:10,400-Speed 3014.06 samples/sec   Loss 9.8919   LearningRate 0.0460   Epoch: 6   Global Step: 79930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:16:13,821-Speed 2993.74 samples/sec   Loss 10.0415   LearningRate 0.0460   Epoch: 6   Global Step: 79940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:16:17,262-Speed 2977.10 samples/sec   Loss 9.9912   LearningRate 0.0460   Epoch: 6   Global Step: 79950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:16:20,632-Speed 3040.01 samples/sec   Loss 9.7659   LearningRate 0.0460   Epoch: 6   Global Step: 79960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:16:24,017-Speed 3025.86 samples/sec   Loss 10.0235   LearningRate 0.0460   Epoch: 6   Global Step: 79970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:16:27,471-Speed 2965.81 samples/sec   Loss 9.8859   LearningRate 0.0460   Epoch: 6   Global Step: 79980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:16:30,872-Speed 3012.47 samples/sec   Loss 9.9063   LearningRate 0.0460   Epoch: 6   Global Step: 79990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:16:34,230-Speed 3050.84 samples/sec   Loss 9.9558   LearningRate 0.0460   Epoch: 6   Global Step: 80000   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:16:37,645-Speed 2999.61 samples/sec   Loss 9.8668   LearningRate 0.0460   Epoch: 6   Global Step: 80010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:16:40,989-Speed 3062.79 samples/sec   Loss 9.9123   LearningRate 0.0460   Epoch: 6   Global Step: 80020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:16:44,407-Speed 2997.03 samples/sec   Loss 9.9142   LearningRate 0.0459   Epoch: 6   Global Step: 80030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:16:47,735-Speed 3077.35 samples/sec   Loss 9.8123   LearningRate 0.0459   Epoch: 6   Global Step: 80040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:16:51,130-Speed 3017.65 samples/sec   Loss 9.9814   LearningRate 0.0459   Epoch: 6   Global Step: 80050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:16:54,559-Speed 2987.60 samples/sec   Loss 9.9187   LearningRate 0.0459   Epoch: 6   Global Step: 80060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:16:57,927-Speed 3040.40 samples/sec   Loss 9.8734   LearningRate 0.0459   Epoch: 6   Global Step: 80070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:01,286-Speed 3049.80 samples/sec   Loss 9.9597   LearningRate 0.0459   Epoch: 6   Global Step: 80080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:04,754-Speed 2953.38 samples/sec   Loss 9.9226   LearningRate 0.0459   Epoch: 6   Global Step: 80090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:08,161-Speed 3006.66 samples/sec   Loss 9.8140   LearningRate 0.0459   Epoch: 6   Global Step: 80100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:11,559-Speed 3014.76 samples/sec   Loss 9.9102   LearningRate 0.0459   Epoch: 6   Global Step: 80110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:14,983-Speed 2991.91 samples/sec   Loss 9.9186   LearningRate 0.0459   Epoch: 6   Global Step: 80120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:17:18,448-Speed 2955.61 samples/sec   Loss 9.7691   LearningRate 0.0459   Epoch: 6   Global Step: 80130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:21,840-Speed 3019.94 samples/sec   Loss 9.8227   LearningRate 0.0459   Epoch: 6   Global Step: 80140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:25,332-Speed 2932.90 samples/sec   Loss 9.9210   LearningRate 0.0459   Epoch: 6   Global Step: 80150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:28,770-Speed 2979.51 samples/sec   Loss 9.9841   LearningRate 0.0459   Epoch: 6   Global Step: 80160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:32,185-Speed 2999.50 samples/sec   Loss 10.0065   LearningRate 0.0459   Epoch: 6   Global Step: 80170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:35,603-Speed 2996.93 samples/sec   Loss 9.8681   LearningRate 0.0459   Epoch: 6   Global Step: 80180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:38,910-Speed 3097.09 samples/sec   Loss 10.0436   LearningRate 0.0459   Epoch: 6   Global Step: 80190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:42,290-Speed 3030.84 samples/sec   Loss 9.9374   LearningRate 0.0459   Epoch: 6   Global Step: 80200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:45,629-Speed 3067.53 samples/sec   Loss 9.9245   LearningRate 0.0458   Epoch: 6   Global Step: 80210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:49,030-Speed 3012.33 samples/sec   Loss 9.9406   LearningRate 0.0458   Epoch: 6   Global Step: 80220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:17:52,456-Speed 2989.60 samples/sec   Loss 9.9263   LearningRate 0.0458   Epoch: 6   Global Step: 80230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:17:55,926-Speed 2951.51 samples/sec   Loss 9.8937   LearningRate 0.0458   Epoch: 6   Global Step: 80240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:17:59,320-Speed 3018.09 samples/sec   Loss 9.8821   LearningRate 0.0458   Epoch: 6   Global Step: 80250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:18:02,684-Speed 3044.58 samples/sec   Loss 9.9868   LearningRate 0.0458   Epoch: 6   Global Step: 80260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:18:06,097-Speed 3001.38 samples/sec   Loss 9.8796   LearningRate 0.0458   Epoch: 6   Global Step: 80270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:18:09,583-Speed 2938.49 samples/sec   Loss 9.8408   LearningRate 0.0458   Epoch: 6   Global Step: 80280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:18:12,972-Speed 3022.16 samples/sec   Loss 9.9400   LearningRate 0.0458   Epoch: 6   Global Step: 80290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:18:16,350-Speed 3033.28 samples/sec   Loss 9.9678   LearningRate 0.0458   Epoch: 6   Global Step: 80300   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:18:19,732-Speed 3027.87 samples/sec   Loss 9.8718   LearningRate 0.0458   Epoch: 6   Global Step: 80310   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:18:23,170-Speed 2979.43 samples/sec   Loss 9.9651   LearningRate 0.0458   Epoch: 6   Global Step: 80320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:18:26,562-Speed 3019.50 samples/sec   Loss 9.9749   LearningRate 0.0458   Epoch: 6   Global Step: 80330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:18:29,975-Speed 3001.63 samples/sec   Loss 9.9425   LearningRate 0.0458   Epoch: 6   Global Step: 80340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:18:33,395-Speed 2995.15 samples/sec   Loss 9.9213   LearningRate 0.0458   Epoch: 6   Global Step: 80350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:18:36,864-Speed 2953.00 samples/sec   Loss 10.0787   LearningRate 0.0458   Epoch: 6   Global Step: 80360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:18:40,245-Speed 3029.20 samples/sec   Loss 9.9101   LearningRate 0.0458   Epoch: 6   Global Step: 80370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:18:43,654-Speed 3004.82 samples/sec   Loss 9.9625   LearningRate 0.0458   Epoch: 6   Global Step: 80380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:18:47,023-Speed 3039.88 samples/sec   Loss 9.7882   LearningRate 0.0458   Epoch: 6   Global Step: 80390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:18:50,437-Speed 3000.49 samples/sec   Loss 10.0600   LearningRate 0.0457   Epoch: 6   Global Step: 80400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:18:53,792-Speed 3053.16 samples/sec   Loss 9.7911   LearningRate 0.0457   Epoch: 6   Global Step: 80410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:18:57,112-Speed 3085.86 samples/sec   Loss 9.9526   LearningRate 0.0457   Epoch: 6   Global Step: 80420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:19:00,529-Speed 2998.18 samples/sec   Loss 9.9087   LearningRate 0.0457   Epoch: 6   Global Step: 80430   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:19:03,888-Speed 3049.46 samples/sec   Loss 9.8144   LearningRate 0.0457   Epoch: 6   Global Step: 80440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:19:07,219-Speed 3074.46 samples/sec   Loss 9.9498   LearningRate 0.0457   Epoch: 6   Global Step: 80450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:19:10,563-Speed 3063.52 samples/sec   Loss 9.9748   LearningRate 0.0457   Epoch: 6   Global Step: 80460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:19:13,967-Speed 3008.53 samples/sec   Loss 9.8544   LearningRate 0.0457   Epoch: 6   Global Step: 80470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:19:17,326-Speed 3049.58 samples/sec   Loss 9.9290   LearningRate 0.0457   Epoch: 6   Global Step: 80480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:19:20,661-Speed 3072.21 samples/sec   Loss 9.7994   LearningRate 0.0457   Epoch: 6   Global Step: 80490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:19:24,040-Speed 3030.48 samples/sec   Loss 9.8051   LearningRate 0.0457   Epoch: 6   Global Step: 80500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:19:27,456-Speed 2999.06 samples/sec   Loss 10.0108   LearningRate 0.0457   Epoch: 6   Global Step: 80510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:19:30,781-Speed 3080.23 samples/sec   Loss 9.9759   LearningRate 0.0457   Epoch: 6   Global Step: 80520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:19:34,129-Speed 3059.33 samples/sec   Loss 9.9370   LearningRate 0.0457   Epoch: 6   Global Step: 80530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:19:37,509-Speed 3030.30 samples/sec   Loss 9.9433   LearningRate 0.0457   Epoch: 6   Global Step: 80540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:19:40,978-Speed 2953.61 samples/sec   Loss 9.8242   LearningRate 0.0457   Epoch: 6   Global Step: 80550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:19:44,294-Speed 3089.13 samples/sec   Loss 9.9890   LearningRate 0.0457   Epoch: 6   Global Step: 80560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:19:47,662-Speed 3040.69 samples/sec   Loss 9.9463   LearningRate 0.0457   Epoch: 6   Global Step: 80570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:19:51,053-Speed 3021.07 samples/sec   Loss 9.8707   LearningRate 0.0456   Epoch: 6   Global Step: 80580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:19:54,433-Speed 3030.46 samples/sec   Loss 10.0690   LearningRate 0.0456   Epoch: 6   Global Step: 80590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:19:57,875-Speed 2975.57 samples/sec   Loss 9.9306   LearningRate 0.0456   Epoch: 6   Global Step: 80600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:01,355-Speed 2943.28 samples/sec   Loss 9.8806   LearningRate 0.0456   Epoch: 6   Global Step: 80610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:04,764-Speed 3005.19 samples/sec   Loss 9.9502   LearningRate 0.0456   Epoch: 6   Global Step: 80620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:08,173-Speed 3004.53 samples/sec   Loss 9.8826   LearningRate 0.0456   Epoch: 6   Global Step: 80630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:11,551-Speed 3032.45 samples/sec   Loss 9.8318   LearningRate 0.0456   Epoch: 6   Global Step: 80640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:14,904-Speed 3054.91 samples/sec   Loss 9.8591   LearningRate 0.0456   Epoch: 6   Global Step: 80650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:18,220-Speed 3088.81 samples/sec   Loss 9.9565   LearningRate 0.0456   Epoch: 6   Global Step: 80660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:21,579-Speed 3049.10 samples/sec   Loss 9.8250   LearningRate 0.0456   Epoch: 6   Global Step: 80670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:25,053-Speed 2948.65 samples/sec   Loss 9.8636   LearningRate 0.0456   Epoch: 6   Global Step: 80680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:28,505-Speed 2967.35 samples/sec   Loss 9.7641   LearningRate 0.0456   Epoch: 6   Global Step: 80690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:20:31,901-Speed 3016.28 samples/sec   Loss 9.7430   LearningRate 0.0456   Epoch: 6   Global Step: 80700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:35,275-Speed 3035.96 samples/sec   Loss 9.9940   LearningRate 0.0456   Epoch: 6   Global Step: 80710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:38,769-Speed 2930.80 samples/sec   Loss 9.8736   LearningRate 0.0456   Epoch: 6   Global Step: 80720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:42,224-Speed 2965.54 samples/sec   Loss 9.9020   LearningRate 0.0456   Epoch: 6   Global Step: 80730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:45,639-Speed 2999.67 samples/sec   Loss 9.8807   LearningRate 0.0456   Epoch: 6   Global Step: 80740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:49,126-Speed 2937.04 samples/sec   Loss 9.9585   LearningRate 0.0456   Epoch: 6   Global Step: 80750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:52,499-Speed 3036.63 samples/sec   Loss 9.8521   LearningRate 0.0455   Epoch: 6   Global Step: 80760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:55,896-Speed 3015.55 samples/sec   Loss 9.9582   LearningRate 0.0455   Epoch: 6   Global Step: 80770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:20:59,191-Speed 3108.07 samples/sec   Loss 9.8250   LearningRate 0.0455   Epoch: 6   Global Step: 80780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:21:02,629-Speed 2979.95 samples/sec   Loss 9.9859   LearningRate 0.0455   Epoch: 6   Global Step: 80790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:21:05,991-Speed 3046.84 samples/sec   Loss 9.9067   LearningRate 0.0455   Epoch: 6   Global Step: 80800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:21:09,319-Speed 3077.37 samples/sec   Loss 9.8363   LearningRate 0.0455   Epoch: 6   Global Step: 80810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:21:12,655-Speed 3070.75 samples/sec   Loss 9.8797   LearningRate 0.0455   Epoch: 6   Global Step: 80820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:21:16,056-Speed 3012.07 samples/sec   Loss 9.9981   LearningRate 0.0455   Epoch: 6   Global Step: 80830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:21:19,445-Speed 3021.83 samples/sec   Loss 9.8536   LearningRate 0.0455   Epoch: 6   Global Step: 80840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:21:22,804-Speed 3049.99 samples/sec   Loss 9.8232   LearningRate 0.0455   Epoch: 6   Global Step: 80850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:21:26,189-Speed 3025.99 samples/sec   Loss 9.9121   LearningRate 0.0455   Epoch: 6   Global Step: 80860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:21:29,517-Speed 3078.02 samples/sec   Loss 9.7818   LearningRate 0.0455   Epoch: 6   Global Step: 80870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 09:21:32,885-Speed 3040.42 samples/sec   Loss 9.6713   LearningRate 0.0455   Epoch: 6   Global Step: 80880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:21:36,390-Speed 2923.44 samples/sec   Loss 9.9743   LearningRate 0.0455   Epoch: 6   Global Step: 80890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:21:39,809-Speed 2997.78 samples/sec   Loss 9.9061   LearningRate 0.0455   Epoch: 6   Global Step: 80900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:21:43,258-Speed 2969.53 samples/sec   Loss 10.0155   LearningRate 0.0455   Epoch: 6   Global Step: 80910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:21:46,599-Speed 3065.61 samples/sec   Loss 9.7953   LearningRate 0.0455   Epoch: 6   Global Step: 80920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:21:50,007-Speed 3005.34 samples/sec   Loss 9.9444   LearningRate 0.0455   Epoch: 6   Global Step: 80930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:21:53,318-Speed 3094.35 samples/sec   Loss 9.9023   LearningRate 0.0455   Epoch: 6   Global Step: 80940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:21:56,690-Speed 3037.45 samples/sec   Loss 9.9389   LearningRate 0.0454   Epoch: 6   Global Step: 80950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:22:00,040-Speed 3056.89 samples/sec   Loss 9.8309   LearningRate 0.0454   Epoch: 6   Global Step: 80960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:22:03,459-Speed 2996.08 samples/sec   Loss 9.9556   LearningRate 0.0454   Epoch: 6   Global Step: 80970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:22:06,833-Speed 3035.66 samples/sec   Loss 9.9730   LearningRate 0.0454   Epoch: 6   Global Step: 80980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:22:10,230-Speed 3015.64 samples/sec   Loss 9.8291   LearningRate 0.0454   Epoch: 6   Global Step: 80990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:22:13,707-Speed 2945.71 samples/sec   Loss 9.7822   LearningRate 0.0454   Epoch: 6   Global Step: 81000   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:22:17,038-Speed 3075.62 samples/sec   Loss 9.7428   LearningRate 0.0454   Epoch: 6   Global Step: 81010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:22:20,425-Speed 3023.93 samples/sec   Loss 9.9476   LearningRate 0.0454   Epoch: 6   Global Step: 81020   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:22:23,798-Speed 3036.41 samples/sec   Loss 9.9012   LearningRate 0.0454   Epoch: 6   Global Step: 81030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:22:27,215-Speed 2998.22 samples/sec   Loss 9.9425   LearningRate 0.0454   Epoch: 6   Global Step: 81040   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:22:30,622-Speed 3006.04 samples/sec   Loss 10.0235   LearningRate 0.0454   Epoch: 6   Global Step: 81050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:22:33,967-Speed 3062.68 samples/sec   Loss 9.7989   LearningRate 0.0454   Epoch: 6   Global Step: 81060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:22:37,409-Speed 2976.08 samples/sec   Loss 9.9324   LearningRate 0.0454   Epoch: 6   Global Step: 81070   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:22:40,703-Speed 3108.94 samples/sec   Loss 9.7792   LearningRate 0.0454   Epoch: 6   Global Step: 81080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:22:44,084-Speed 3029.77 samples/sec   Loss 9.9850   LearningRate 0.0454   Epoch: 6   Global Step: 81090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:22:47,447-Speed 3045.28 samples/sec   Loss 9.8025   LearningRate 0.0454   Epoch: 6   Global Step: 81100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:22:50,798-Speed 3056.47 samples/sec   Loss 9.9064   LearningRate 0.0454   Epoch: 6   Global Step: 81110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:22:54,171-Speed 3037.09 samples/sec   Loss 10.0240   LearningRate 0.0454   Epoch: 6   Global Step: 81120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:22:57,487-Speed 3088.96 samples/sec   Loss 9.8117   LearningRate 0.0453   Epoch: 6   Global Step: 81130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:23:00,855-Speed 3041.40 samples/sec   Loss 9.7351   LearningRate 0.0453   Epoch: 6   Global Step: 81140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:23:04,212-Speed 3051.48 samples/sec   Loss 9.5927   LearningRate 0.0453   Epoch: 6   Global Step: 81150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:23:07,558-Speed 3060.87 samples/sec   Loss 9.9222   LearningRate 0.0453   Epoch: 6   Global Step: 81160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:23:10,915-Speed 3051.83 samples/sec   Loss 9.8944   LearningRate 0.0453   Epoch: 6   Global Step: 81170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:23:14,317-Speed 3011.04 samples/sec   Loss 9.8539   LearningRate 0.0453   Epoch: 6   Global Step: 81180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:23:17,687-Speed 3038.66 samples/sec   Loss 9.8696   LearningRate 0.0453   Epoch: 6   Global Step: 81190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:23:21,059-Speed 3037.55 samples/sec   Loss 9.9418   LearningRate 0.0453   Epoch: 6   Global Step: 81200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:23:24,415-Speed 3052.70 samples/sec   Loss 10.0057   LearningRate 0.0453   Epoch: 6   Global Step: 81210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:23:27,889-Speed 2948.81 samples/sec   Loss 9.8763   LearningRate 0.0453   Epoch: 6   Global Step: 81220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:23:31,276-Speed 3023.37 samples/sec   Loss 9.9597   LearningRate 0.0453   Epoch: 6   Global Step: 81230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:23:34,710-Speed 2982.62 samples/sec   Loss 9.8066   LearningRate 0.0453   Epoch: 6   Global Step: 81240   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:23:38,049-Speed 3068.14 samples/sec   Loss 9.8529   LearningRate 0.0453   Epoch: 6   Global Step: 81250   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:23:41,545-Speed 2930.29 samples/sec   Loss 9.8039   LearningRate 0.0453   Epoch: 6   Global Step: 81260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:23:44,979-Speed 2982.08 samples/sec   Loss 9.9378   LearningRate 0.0453   Epoch: 6   Global Step: 81270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:23:48,402-Speed 2992.68 samples/sec   Loss 9.8591   LearningRate 0.0453   Epoch: 6   Global Step: 81280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:23:51,847-Speed 2973.90 samples/sec   Loss 9.7574   LearningRate 0.0453   Epoch: 6   Global Step: 81290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:23:55,264-Speed 2997.11 samples/sec   Loss 9.8658   LearningRate 0.0453   Epoch: 6   Global Step: 81300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:23:58,701-Speed 2980.22 samples/sec   Loss 9.7620   LearningRate 0.0453   Epoch: 6   Global Step: 81310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:24:02,105-Speed 3009.48 samples/sec   Loss 9.9030   LearningRate 0.0452   Epoch: 6   Global Step: 81320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:24:05,454-Speed 3058.69 samples/sec   Loss 9.9431   LearningRate 0.0452   Epoch: 6   Global Step: 81330   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:08,902-Speed 2970.72 samples/sec   Loss 9.9216   LearningRate 0.0452   Epoch: 6   Global Step: 81340   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:12,308-Speed 3007.63 samples/sec   Loss 9.9276   LearningRate 0.0452   Epoch: 6   Global Step: 81350   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:15,663-Speed 3052.45 samples/sec   Loss 9.8654   LearningRate 0.0452   Epoch: 6   Global Step: 81360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:19,028-Speed 3043.91 samples/sec   Loss 9.8599   LearningRate 0.0452   Epoch: 6   Global Step: 81370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:22,392-Speed 3044.81 samples/sec   Loss 9.8190   LearningRate 0.0452   Epoch: 6   Global Step: 81380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:25,763-Speed 3039.01 samples/sec   Loss 9.9401   LearningRate 0.0452   Epoch: 6   Global Step: 81390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:29,102-Speed 3068.26 samples/sec   Loss 9.9883   LearningRate 0.0452   Epoch: 6   Global Step: 81400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:32,421-Speed 3085.77 samples/sec   Loss 9.6822   LearningRate 0.0452   Epoch: 6   Global Step: 81410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:35,791-Speed 3039.41 samples/sec   Loss 9.8958   LearningRate 0.0452   Epoch: 6   Global Step: 81420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:39,193-Speed 3011.46 samples/sec   Loss 9.8655   LearningRate 0.0452   Epoch: 6   Global Step: 81430   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:42,613-Speed 2995.17 samples/sec   Loss 9.9250   LearningRate 0.0452   Epoch: 6   Global Step: 81440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:46,036-Speed 2992.05 samples/sec   Loss 9.8022   LearningRate 0.0452   Epoch: 6   Global Step: 81450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:49,451-Speed 2999.80 samples/sec   Loss 9.7725   LearningRate 0.0452   Epoch: 6   Global Step: 81460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:52,803-Speed 3055.68 samples/sec   Loss 9.7485   LearningRate 0.0452   Epoch: 6   Global Step: 81470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:56,217-Speed 3000.65 samples/sec   Loss 9.8524   LearningRate 0.0452   Epoch: 6   Global Step: 81480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:24:59,569-Speed 3055.58 samples/sec   Loss 9.8835   LearningRate 0.0452   Epoch: 6   Global Step: 81490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:25:03,020-Speed 2967.81 samples/sec   Loss 9.7697   LearningRate 0.0451   Epoch: 6   Global Step: 81500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:25:06,389-Speed 3040.69 samples/sec   Loss 9.7130   LearningRate 0.0451   Epoch: 6   Global Step: 81510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:25:09,791-Speed 3010.77 samples/sec   Loss 9.8253   LearningRate 0.0451   Epoch: 6   Global Step: 81520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:25:13,139-Speed 3059.74 samples/sec   Loss 9.7594   LearningRate 0.0451   Epoch: 6   Global Step: 81530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:25:16,633-Speed 2932.43 samples/sec   Loss 9.8023   LearningRate 0.0451   Epoch: 6   Global Step: 81540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:25:20,001-Speed 3040.57 samples/sec   Loss 9.8520   LearningRate 0.0451   Epoch: 6   Global Step: 81550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:25:23,428-Speed 2989.17 samples/sec   Loss 9.8438   LearningRate 0.0451   Epoch: 6   Global Step: 81560   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:25:26,836-Speed 3006.30 samples/sec   Loss 9.8349   LearningRate 0.0451   Epoch: 6   Global Step: 81570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:25:30,148-Speed 3093.21 samples/sec   Loss 9.7642   LearningRate 0.0451   Epoch: 6   Global Step: 81580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:25:33,593-Speed 2973.10 samples/sec   Loss 9.7942   LearningRate 0.0451   Epoch: 6   Global Step: 81590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:25:37,034-Speed 2976.66 samples/sec   Loss 9.8904   LearningRate 0.0451   Epoch: 6   Global Step: 81600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:25:40,522-Speed 2936.11 samples/sec   Loss 9.8992   LearningRate 0.0451   Epoch: 6   Global Step: 81610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:25:43,935-Speed 3001.34 samples/sec   Loss 9.9066   LearningRate 0.0451   Epoch: 6   Global Step: 81620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:25:47,311-Speed 3034.25 samples/sec   Loss 9.8103   LearningRate 0.0451   Epoch: 6   Global Step: 81630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:25:50,736-Speed 2990.23 samples/sec   Loss 9.8012   LearningRate 0.0451   Epoch: 6   Global Step: 81640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:25:54,169-Speed 2984.22 samples/sec   Loss 9.7150   LearningRate 0.0451   Epoch: 6   Global Step: 81650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:25:57,580-Speed 3002.41 samples/sec   Loss 9.8334   LearningRate 0.0451   Epoch: 6   Global Step: 81660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:00,958-Speed 3031.85 samples/sec   Loss 9.8646   LearningRate 0.0451   Epoch: 6   Global Step: 81670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:04,342-Speed 3027.77 samples/sec   Loss 9.7748   LearningRate 0.0451   Epoch: 6   Global Step: 81680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:26:07,723-Speed 3029.20 samples/sec   Loss 9.8152   LearningRate 0.0450   Epoch: 6   Global Step: 81690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:26:11,136-Speed 3001.78 samples/sec   Loss 9.8601   LearningRate 0.0450   Epoch: 6   Global Step: 81700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:14,586-Speed 2969.06 samples/sec   Loss 9.8618   LearningRate 0.0450   Epoch: 6   Global Step: 81710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:17,945-Speed 3049.02 samples/sec   Loss 9.8100   LearningRate 0.0450   Epoch: 6   Global Step: 81720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:21,342-Speed 3015.43 samples/sec   Loss 9.7499   LearningRate 0.0450   Epoch: 6   Global Step: 81730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:24,772-Speed 2986.54 samples/sec   Loss 9.9438   LearningRate 0.0450   Epoch: 6   Global Step: 81740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:28,170-Speed 3015.05 samples/sec   Loss 9.8219   LearningRate 0.0450   Epoch: 6   Global Step: 81750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:31,642-Speed 2950.15 samples/sec   Loss 9.8867   LearningRate 0.0450   Epoch: 6   Global Step: 81760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:35,021-Speed 3030.50 samples/sec   Loss 9.7620   LearningRate 0.0450   Epoch: 6   Global Step: 81770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:38,508-Speed 2937.39 samples/sec   Loss 9.8517   LearningRate 0.0450   Epoch: 6   Global Step: 81780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:41,900-Speed 3020.25 samples/sec   Loss 9.7934   LearningRate 0.0450   Epoch: 6   Global Step: 81790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:45,288-Speed 3023.50 samples/sec   Loss 9.9655   LearningRate 0.0450   Epoch: 6   Global Step: 81800   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:26:48,656-Speed 3041.45 samples/sec   Loss 9.8560   LearningRate 0.0450   Epoch: 6   Global Step: 81810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:52,003-Speed 3060.19 samples/sec   Loss 9.6061   LearningRate 0.0450   Epoch: 6   Global Step: 81820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:55,411-Speed 3005.66 samples/sec   Loss 9.9837   LearningRate 0.0450   Epoch: 6   Global Step: 81830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:26:58,839-Speed 2988.39 samples/sec   Loss 9.7736   LearningRate 0.0450   Epoch: 6   Global Step: 81840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:27:02,231-Speed 3019.08 samples/sec   Loss 9.9156   LearningRate 0.0450   Epoch: 6   Global Step: 81850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:27:05,634-Speed 3010.71 samples/sec   Loss 9.6171   LearningRate 0.0450   Epoch: 6   Global Step: 81860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:27:09,045-Speed 3003.02 samples/sec   Loss 9.7061   LearningRate 0.0449   Epoch: 6   Global Step: 81870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:27:12,463-Speed 2996.16 samples/sec   Loss 9.8300   LearningRate 0.0449   Epoch: 6   Global Step: 81880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:27:15,832-Speed 3040.58 samples/sec   Loss 9.9236   LearningRate 0.0449   Epoch: 6   Global Step: 81890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:27:19,282-Speed 2968.26 samples/sec   Loss 9.8619   LearningRate 0.0449   Epoch: 6   Global Step: 81900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:27:22,761-Speed 2944.35 samples/sec   Loss 9.7810   LearningRate 0.0449   Epoch: 6   Global Step: 81910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:27:26,222-Speed 2959.76 samples/sec   Loss 9.8425   LearningRate 0.0449   Epoch: 6   Global Step: 81920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:27:29,673-Speed 2967.72 samples/sec   Loss 9.8770   LearningRate 0.0449   Epoch: 6   Global Step: 81930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:27:33,029-Speed 3052.68 samples/sec   Loss 9.9932   LearningRate 0.0449   Epoch: 6   Global Step: 81940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:27:36,435-Speed 3007.23 samples/sec   Loss 9.7433   LearningRate 0.0449   Epoch: 6   Global Step: 81950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:27:39,890-Speed 2964.20 samples/sec   Loss 9.7382   LearningRate 0.0449   Epoch: 6   Global Step: 81960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:27:43,324-Speed 2982.92 samples/sec   Loss 9.9012   LearningRate 0.0449   Epoch: 6   Global Step: 81970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:27:46,738-Speed 3000.96 samples/sec   Loss 9.8182   LearningRate 0.0449   Epoch: 6   Global Step: 81980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:27:50,136-Speed 3014.47 samples/sec   Loss 9.8802   LearningRate 0.0449   Epoch: 6   Global Step: 81990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:27:53,561-Speed 2990.27 samples/sec   Loss 9.7372   LearningRate 0.0449   Epoch: 6   Global Step: 82000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:27:56,919-Speed 3049.78 samples/sec   Loss 9.7345   LearningRate 0.0449   Epoch: 6   Global Step: 82010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:00,268-Speed 3058.70 samples/sec   Loss 9.7644   LearningRate 0.0449   Epoch: 6   Global Step: 82020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:03,589-Speed 3085.18 samples/sec   Loss 9.8388   LearningRate 0.0449   Epoch: 6   Global Step: 82030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:06,894-Speed 3098.53 samples/sec   Loss 9.7293   LearningRate 0.0449   Epoch: 6   Global Step: 82040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:10,257-Speed 3046.10 samples/sec   Loss 9.8118   LearningRate 0.0449   Epoch: 6   Global Step: 82050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:13,574-Speed 3089.05 samples/sec   Loss 9.8318   LearningRate 0.0448   Epoch: 6   Global Step: 82060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:16,926-Speed 3055.34 samples/sec   Loss 9.8148   LearningRate 0.0448   Epoch: 6   Global Step: 82070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:20,334-Speed 3005.15 samples/sec   Loss 9.7919   LearningRate 0.0448   Epoch: 6   Global Step: 82080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:23,638-Speed 3100.71 samples/sec   Loss 9.8566   LearningRate 0.0448   Epoch: 6   Global Step: 82090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:26,953-Speed 3089.47 samples/sec   Loss 9.8298   LearningRate 0.0448   Epoch: 6   Global Step: 82100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:28:30,357-Speed 3009.45 samples/sec   Loss 9.9397   LearningRate 0.0448   Epoch: 6   Global Step: 82110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:33,708-Speed 3056.39 samples/sec   Loss 9.8034   LearningRate 0.0448   Epoch: 6   Global Step: 82120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:37,103-Speed 3016.99 samples/sec   Loss 9.8336   LearningRate 0.0448   Epoch: 6   Global Step: 82130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:40,554-Speed 2968.15 samples/sec   Loss 9.9021   LearningRate 0.0448   Epoch: 6   Global Step: 82140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:43,931-Speed 3033.17 samples/sec   Loss 9.7170   LearningRate 0.0448   Epoch: 6   Global Step: 82150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:47,247-Speed 3088.47 samples/sec   Loss 9.8138   LearningRate 0.0448   Epoch: 6   Global Step: 82160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:50,626-Speed 3032.23 samples/sec   Loss 9.6929   LearningRate 0.0448   Epoch: 6   Global Step: 82170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:54,042-Speed 2998.31 samples/sec   Loss 9.7222   LearningRate 0.0448   Epoch: 6   Global Step: 82180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:28:57,428-Speed 3024.88 samples/sec   Loss 9.7019   LearningRate 0.0448   Epoch: 6   Global Step: 82190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:29:00,818-Speed 3021.17 samples/sec   Loss 9.7830   LearningRate 0.0448   Epoch: 6   Global Step: 82200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:29:04,264-Speed 2972.72 samples/sec   Loss 9.8291   LearningRate 0.0448   Epoch: 6   Global Step: 82210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:29:07,608-Speed 3063.29 samples/sec   Loss 9.8403   LearningRate 0.0448   Epoch: 6   Global Step: 82220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:29:11,005-Speed 3014.84 samples/sec   Loss 9.7197   LearningRate 0.0448   Epoch: 6   Global Step: 82230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:29:14,413-Speed 3005.70 samples/sec   Loss 9.9420   LearningRate 0.0447   Epoch: 6   Global Step: 82240   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:29:17,870-Speed 2962.88 samples/sec   Loss 9.8742   LearningRate 0.0447   Epoch: 6   Global Step: 82250   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:29:21,220-Speed 3058.60 samples/sec   Loss 9.8290   LearningRate 0.0447   Epoch: 6   Global Step: 82260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:29:24,601-Speed 3029.08 samples/sec   Loss 9.7106   LearningRate 0.0447   Epoch: 6   Global Step: 82270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:29:27,928-Speed 3078.60 samples/sec   Loss 9.9491   LearningRate 0.0447   Epoch: 6   Global Step: 82280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:29:31,298-Speed 3039.45 samples/sec   Loss 9.8990   LearningRate 0.0447   Epoch: 6   Global Step: 82290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:29:34,659-Speed 3047.97 samples/sec   Loss 9.7581   LearningRate 0.0447   Epoch: 6   Global Step: 82300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:29:37,979-Speed 3085.20 samples/sec   Loss 9.8474   LearningRate 0.0447   Epoch: 6   Global Step: 82310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:29:41,365-Speed 3024.84 samples/sec   Loss 9.7141   LearningRate 0.0447   Epoch: 6   Global Step: 82320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:29:44,798-Speed 2983.60 samples/sec   Loss 9.7215   LearningRate 0.0447   Epoch: 6   Global Step: 82330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:29:48,166-Speed 3041.43 samples/sec   Loss 9.7928   LearningRate 0.0447   Epoch: 6   Global Step: 82340   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:29:51,491-Speed 3081.18 samples/sec   Loss 9.7859   LearningRate 0.0447   Epoch: 6   Global Step: 82350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:29:54,842-Speed 3056.50 samples/sec   Loss 9.8739   LearningRate 0.0447   Epoch: 6   Global Step: 82360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:29:58,254-Speed 3001.59 samples/sec   Loss 9.6801   LearningRate 0.0447   Epoch: 6   Global Step: 82370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:30:01,678-Speed 2991.45 samples/sec   Loss 9.8270   LearningRate 0.0447   Epoch: 6   Global Step: 82380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:30:05,036-Speed 3050.87 samples/sec   Loss 9.6994   LearningRate 0.0447   Epoch: 6   Global Step: 82390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:30:08,442-Speed 3007.27 samples/sec   Loss 9.8301   LearningRate 0.0447   Epoch: 6   Global Step: 82400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:30:11,801-Speed 3048.66 samples/sec   Loss 9.8519   LearningRate 0.0447   Epoch: 6   Global Step: 82410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:30:15,119-Speed 3087.63 samples/sec   Loss 9.6506   LearningRate 0.0447   Epoch: 6   Global Step: 82420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:30:18,471-Speed 3055.84 samples/sec   Loss 9.6624   LearningRate 0.0446   Epoch: 6   Global Step: 82430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:30:21,830-Speed 3048.66 samples/sec   Loss 9.6897   LearningRate 0.0446   Epoch: 6   Global Step: 82440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:30:25,189-Speed 3050.11 samples/sec   Loss 9.7483   LearningRate 0.0446   Epoch: 6   Global Step: 82450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:30:28,520-Speed 3074.72 samples/sec   Loss 9.6448   LearningRate 0.0446   Epoch: 6   Global Step: 82460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:30:31,841-Speed 3084.74 samples/sec   Loss 9.8070   LearningRate 0.0446   Epoch: 6   Global Step: 82470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:30:35,299-Speed 2961.99 samples/sec   Loss 9.6670   LearningRate 0.0446   Epoch: 6   Global Step: 82480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:30:38,706-Speed 3006.43 samples/sec   Loss 9.8398   LearningRate 0.0446   Epoch: 6   Global Step: 82490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:30:42,065-Speed 3049.43 samples/sec   Loss 9.7466   LearningRate 0.0446   Epoch: 6   Global Step: 82500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:30:45,417-Speed 3055.83 samples/sec   Loss 9.7428   LearningRate 0.0446   Epoch: 6   Global Step: 82510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:30:48,815-Speed 3014.48 samples/sec   Loss 9.9718   LearningRate 0.0446   Epoch: 6   Global Step: 82520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:30:52,150-Speed 3071.94 samples/sec   Loss 9.8265   LearningRate 0.0446   Epoch: 6   Global Step: 82530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:30:55,505-Speed 3052.75 samples/sec   Loss 9.9564   LearningRate 0.0446   Epoch: 6   Global Step: 82540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:30:58,892-Speed 3024.40 samples/sec   Loss 9.8790   LearningRate 0.0446   Epoch: 6   Global Step: 82550   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-04-27 09:31:02,213-Speed 3084.19 samples/sec   Loss 9.8043   LearningRate 0.0446   Epoch: 6   Global Step: 82560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:31:05,516-Speed 3101.43 samples/sec   Loss 9.7167   LearningRate 0.0446   Epoch: 6   Global Step: 82570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:31:08,890-Speed 3035.76 samples/sec   Loss 9.7685   LearningRate 0.0446   Epoch: 6   Global Step: 82580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:31:12,239-Speed 3058.43 samples/sec   Loss 9.6987   LearningRate 0.0446   Epoch: 6   Global Step: 82590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:31:15,586-Speed 3060.15 samples/sec   Loss 9.7313   LearningRate 0.0446   Epoch: 6   Global Step: 82600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:31:18,926-Speed 3066.96 samples/sec   Loss 9.6786   LearningRate 0.0446   Epoch: 6   Global Step: 82610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:31:22,312-Speed 3025.33 samples/sec   Loss 9.8181   LearningRate 0.0445   Epoch: 6   Global Step: 82620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:31:25,735-Speed 2992.01 samples/sec   Loss 9.8660   LearningRate 0.0445   Epoch: 6   Global Step: 82630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:31:29,121-Speed 3025.17 samples/sec   Loss 9.7739   LearningRate 0.0445   Epoch: 6   Global Step: 82640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:31:32,513-Speed 3020.25 samples/sec   Loss 9.7124   LearningRate 0.0445   Epoch: 6   Global Step: 82650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:31:35,883-Speed 3039.02 samples/sec   Loss 9.5441   LearningRate 0.0445   Epoch: 6   Global Step: 82660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:31:39,215-Speed 3074.10 samples/sec   Loss 9.6621   LearningRate 0.0445   Epoch: 6   Global Step: 82670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:31:42,599-Speed 3027.21 samples/sec   Loss 9.8367   LearningRate 0.0445   Epoch: 6   Global Step: 82680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:31:45,977-Speed 3031.98 samples/sec   Loss 9.7390   LearningRate 0.0445   Epoch: 6   Global Step: 82690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:31:49,308-Speed 3075.27 samples/sec   Loss 9.7824   LearningRate 0.0445   Epoch: 6   Global Step: 82700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:31:52,737-Speed 2987.22 samples/sec   Loss 9.8047   LearningRate 0.0445   Epoch: 6   Global Step: 82710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:31:56,133-Speed 3015.58 samples/sec   Loss 9.7261   LearningRate 0.0445   Epoch: 6   Global Step: 82720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:31:59,488-Speed 3053.69 samples/sec   Loss 9.7911   LearningRate 0.0445   Epoch: 6   Global Step: 82730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:32:02,828-Speed 3066.65 samples/sec   Loss 9.6781   LearningRate 0.0445   Epoch: 6   Global Step: 82740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:32:06,141-Speed 3091.41 samples/sec   Loss 9.7717   LearningRate 0.0445   Epoch: 6   Global Step: 82750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:32:09,499-Speed 3050.78 samples/sec   Loss 9.7712   LearningRate 0.0445   Epoch: 6   Global Step: 82760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:32:12,986-Speed 2937.70 samples/sec   Loss 9.6895   LearningRate 0.0445   Epoch: 6   Global Step: 82770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:32:16,383-Speed 3015.16 samples/sec   Loss 9.8024   LearningRate 0.0445   Epoch: 6   Global Step: 82780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:32:19,817-Speed 2983.03 samples/sec   Loss 9.8966   LearningRate 0.0445   Epoch: 6   Global Step: 82790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:32:23,191-Speed 3035.52 samples/sec   Loss 9.7819   LearningRate 0.0444   Epoch: 6   Global Step: 82800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:32:26,695-Speed 2923.65 samples/sec   Loss 9.8176   LearningRate 0.0444   Epoch: 6   Global Step: 82810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:32:30,097-Speed 3011.05 samples/sec   Loss 9.7011   LearningRate 0.0444   Epoch: 6   Global Step: 82820   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:32:33,566-Speed 2952.18 samples/sec   Loss 9.7058   LearningRate 0.0444   Epoch: 6   Global Step: 82830   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:32:36,995-Speed 2987.35 samples/sec   Loss 9.6685   LearningRate 0.0444   Epoch: 6   Global Step: 82840   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:32:40,305-Speed 3094.46 samples/sec   Loss 9.7546   LearningRate 0.0444   Epoch: 6   Global Step: 82850   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:32:43,641-Speed 3071.02 samples/sec   Loss 9.8023   LearningRate 0.0444   Epoch: 6   Global Step: 82860   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:32:46,978-Speed 3069.39 samples/sec   Loss 9.7846   LearningRate 0.0444   Epoch: 6   Global Step: 82870   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:32:50,429-Speed 2967.80 samples/sec   Loss 9.7333   LearningRate 0.0444   Epoch: 6   Global Step: 82880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:32:53,890-Speed 2959.87 samples/sec   Loss 9.6885   LearningRate 0.0444   Epoch: 6   Global Step: 82890   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:32:57,274-Speed 3026.57 samples/sec   Loss 9.7130   LearningRate 0.0444   Epoch: 6   Global Step: 82900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:33:00,703-Speed 2987.06 samples/sec   Loss 9.7781   LearningRate 0.0444   Epoch: 6   Global Step: 82910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:33:04,143-Speed 2978.21 samples/sec   Loss 9.6479   LearningRate 0.0444   Epoch: 6   Global Step: 82920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:33:07,530-Speed 3023.85 samples/sec   Loss 9.8575   LearningRate 0.0444   Epoch: 6   Global Step: 82930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:33:10,991-Speed 2959.78 samples/sec   Loss 9.7912   LearningRate 0.0444   Epoch: 6   Global Step: 82940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:33:14,444-Speed 2966.69 samples/sec   Loss 9.8760   LearningRate 0.0444   Epoch: 6   Global Step: 82950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:33:17,803-Speed 3049.75 samples/sec   Loss 9.8397   LearningRate 0.0444   Epoch: 6   Global Step: 82960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:33:21,200-Speed 3015.22 samples/sec   Loss 9.8930   LearningRate 0.0444   Epoch: 6   Global Step: 82970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:33:24,556-Speed 3051.94 samples/sec   Loss 9.7362   LearningRate 0.0444   Epoch: 6   Global Step: 82980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:33:27,912-Speed 3052.25 samples/sec   Loss 9.8945   LearningRate 0.0443   Epoch: 6   Global Step: 82990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:33:31,308-Speed 3016.94 samples/sec   Loss 9.6735   LearningRate 0.0443   Epoch: 6   Global Step: 83000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:33:34,719-Speed 3002.61 samples/sec   Loss 9.6552   LearningRate 0.0443   Epoch: 6   Global Step: 83010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:33:38,189-Speed 2951.68 samples/sec   Loss 9.9245   LearningRate 0.0443   Epoch: 6   Global Step: 83020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:33:41,617-Speed 2988.60 samples/sec   Loss 9.7500   LearningRate 0.0443   Epoch: 6   Global Step: 83030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:33:45,067-Speed 2969.31 samples/sec   Loss 9.7096   LearningRate 0.0443   Epoch: 6   Global Step: 83040   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:33:48,458-Speed 3020.49 samples/sec   Loss 9.8316   LearningRate 0.0443   Epoch: 6   Global Step: 83050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:33:51,881-Speed 2992.73 samples/sec   Loss 9.7340   LearningRate 0.0443   Epoch: 6   Global Step: 83060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:33:55,279-Speed 3014.69 samples/sec   Loss 9.8786   LearningRate 0.0443   Epoch: 6   Global Step: 83070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:33:58,695-Speed 2999.02 samples/sec   Loss 9.8211   LearningRate 0.0443   Epoch: 6   Global Step: 83080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:02,112-Speed 2997.72 samples/sec   Loss 9.6287   LearningRate 0.0443   Epoch: 6   Global Step: 83090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:05,476-Speed 3045.01 samples/sec   Loss 9.6756   LearningRate 0.0443   Epoch: 6   Global Step: 83100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:08,856-Speed 3030.56 samples/sec   Loss 9.6666   LearningRate 0.0443   Epoch: 6   Global Step: 83110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:12,289-Speed 2983.65 samples/sec   Loss 9.7164   LearningRate 0.0443   Epoch: 6   Global Step: 83120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:15,720-Speed 2985.55 samples/sec   Loss 9.8116   LearningRate 0.0443   Epoch: 6   Global Step: 83130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:19,046-Speed 3079.08 samples/sec   Loss 9.7455   LearningRate 0.0443   Epoch: 6   Global Step: 83140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:22,385-Speed 3067.98 samples/sec   Loss 9.6743   LearningRate 0.0443   Epoch: 6   Global Step: 83150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:25,795-Speed 3004.54 samples/sec   Loss 9.7693   LearningRate 0.0443   Epoch: 6   Global Step: 83160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:29,170-Speed 3035.36 samples/sec   Loss 9.7711   LearningRate 0.0442   Epoch: 6   Global Step: 83170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:34:32,573-Speed 3010.25 samples/sec   Loss 9.7870   LearningRate 0.0442   Epoch: 6   Global Step: 83180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:34:35,915-Speed 3065.32 samples/sec   Loss 9.6672   LearningRate 0.0442   Epoch: 6   Global Step: 83190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:39,230-Speed 3089.79 samples/sec   Loss 9.7679   LearningRate 0.0442   Epoch: 6   Global Step: 83200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:42,647-Speed 2997.38 samples/sec   Loss 9.6511   LearningRate 0.0442   Epoch: 6   Global Step: 83210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:46,043-Speed 3016.86 samples/sec   Loss 9.7629   LearningRate 0.0442   Epoch: 6   Global Step: 83220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:49,520-Speed 2946.07 samples/sec   Loss 9.7812   LearningRate 0.0442   Epoch: 6   Global Step: 83230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:52,897-Speed 3032.73 samples/sec   Loss 9.6953   LearningRate 0.0442   Epoch: 6   Global Step: 83240   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:56,339-Speed 2976.71 samples/sec   Loss 9.6958   LearningRate 0.0442   Epoch: 6   Global Step: 83250   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:34:59,709-Speed 3039.22 samples/sec   Loss 9.7933   LearningRate 0.0442   Epoch: 6   Global Step: 83260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:35:03,101-Speed 3019.65 samples/sec   Loss 9.6602   LearningRate 0.0442   Epoch: 6   Global Step: 83270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:35:06,466-Speed 3045.04 samples/sec   Loss 9.6984   LearningRate 0.0442   Epoch: 6   Global Step: 83280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:35:09,822-Speed 3051.70 samples/sec   Loss 9.6941   LearningRate 0.0442   Epoch: 6   Global Step: 83290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 09:35:13,233-Speed 3002.67 samples/sec   Loss 9.6906   LearningRate 0.0442   Epoch: 6   Global Step: 83300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:35:16,673-Speed 2977.78 samples/sec   Loss 9.6801   LearningRate 0.0442   Epoch: 6   Global Step: 83310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:35:20,059-Speed 3025.23 samples/sec   Loss 9.5980   LearningRate 0.0442   Epoch: 6   Global Step: 83320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 09:35:23,478-Speed 2995.10 samples/sec   Loss 9.7321   LearningRate 0.0442   Epoch: 6   Global Step: 83330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:35:26,856-Speed 3032.20 samples/sec   Loss 9.7236   LearningRate 0.0442   Epoch: 6   Global Step: 83340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:35:30,209-Speed 3054.97 samples/sec   Loss 9.7578   LearningRate 0.0442   Epoch: 6   Global Step: 83350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:35:33,581-Speed 3037.50 samples/sec   Loss 9.7039   LearningRate 0.0441   Epoch: 6   Global Step: 83360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:35:36,965-Speed 3027.80 samples/sec   Loss 9.6978   LearningRate 0.0441   Epoch: 6   Global Step: 83370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:35:40,354-Speed 3021.66 samples/sec   Loss 9.6224   LearningRate 0.0441   Epoch: 6   Global Step: 83380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:35:43,780-Speed 2990.11 samples/sec   Loss 9.6020   LearningRate 0.0441   Epoch: 6   Global Step: 83390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:35:47,241-Speed 2959.06 samples/sec   Loss 9.7015   LearningRate 0.0441   Epoch: 6   Global Step: 83400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:35:50,611-Speed 3039.57 samples/sec   Loss 9.7224   LearningRate 0.0441   Epoch: 6   Global Step: 83410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:35:54,083-Speed 2949.99 samples/sec   Loss 9.9047   LearningRate 0.0441   Epoch: 6   Global Step: 83420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:35:57,438-Speed 3053.44 samples/sec   Loss 9.6433   LearningRate 0.0441   Epoch: 6   Global Step: 83430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:36:00,814-Speed 3034.07 samples/sec   Loss 9.6075   LearningRate 0.0441   Epoch: 6   Global Step: 83440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:36:04,255-Speed 2977.32 samples/sec   Loss 9.8044   LearningRate 0.0441   Epoch: 6   Global Step: 83450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:36:07,765-Speed 2918.34 samples/sec   Loss 9.8074   LearningRate 0.0441   Epoch: 6   Global Step: 83460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:36:11,152-Speed 3023.79 samples/sec   Loss 9.7239   LearningRate 0.0441   Epoch: 6   Global Step: 83470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:36:14,588-Speed 2981.36 samples/sec   Loss 9.8296   LearningRate 0.0441   Epoch: 6   Global Step: 83480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:36:17,911-Speed 3082.60 samples/sec   Loss 9.8217   LearningRate 0.0441   Epoch: 6   Global Step: 83490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:36:21,243-Speed 3074.36 samples/sec   Loss 9.6208   LearningRate 0.0441   Epoch: 6   Global Step: 83500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:36:24,627-Speed 3026.96 samples/sec   Loss 9.6540   LearningRate 0.0441   Epoch: 6   Global Step: 83510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:36:28,066-Speed 2977.71 samples/sec   Loss 9.7093   LearningRate 0.0441   Epoch: 6   Global Step: 83520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:36:31,572-Speed 2922.43 samples/sec   Loss 9.6464   LearningRate 0.0441   Epoch: 6   Global Step: 83530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:36:35,020-Speed 2970.54 samples/sec   Loss 9.6401   LearningRate 0.0441   Epoch: 6   Global Step: 83540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:36:38,440-Speed 2995.18 samples/sec   Loss 9.6447   LearningRate 0.0440   Epoch: 6   Global Step: 83550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:36:41,793-Speed 3054.25 samples/sec   Loss 9.8059   LearningRate 0.0440   Epoch: 6   Global Step: 83560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:36:45,113-Speed 3085.94 samples/sec   Loss 9.7815   LearningRate 0.0440   Epoch: 6   Global Step: 83570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:36:48,475-Speed 3046.21 samples/sec   Loss 9.6619   LearningRate 0.0440   Epoch: 6   Global Step: 83580   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:36:51,827-Speed 3055.80 samples/sec   Loss 9.7820   LearningRate 0.0440   Epoch: 6   Global Step: 83590   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:36:55,221-Speed 3017.86 samples/sec   Loss 9.5901   LearningRate 0.0440   Epoch: 6   Global Step: 83600   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:36:58,601-Speed 3030.66 samples/sec   Loss 9.7114   LearningRate 0.0440   Epoch: 6   Global Step: 83610   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:37:02,022-Speed 2994.39 samples/sec   Loss 9.6113   LearningRate 0.0440   Epoch: 6   Global Step: 83620   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:37:05,371-Speed 3057.82 samples/sec   Loss 9.6793   LearningRate 0.0440   Epoch: 6   Global Step: 83630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:37:08,716-Speed 3062.54 samples/sec   Loss 9.7147   LearningRate 0.0440   Epoch: 6   Global Step: 83640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:37:12,093-Speed 3033.68 samples/sec   Loss 9.7576   LearningRate 0.0440   Epoch: 6   Global Step: 83650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:37:15,500-Speed 3005.76 samples/sec   Loss 9.6405   LearningRate 0.0440   Epoch: 6   Global Step: 83660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:37:18,960-Speed 2960.76 samples/sec   Loss 9.6364   LearningRate 0.0440   Epoch: 6   Global Step: 83670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:37:22,353-Speed 3018.63 samples/sec   Loss 9.7675   LearningRate 0.0440   Epoch: 6   Global Step: 83680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:37:25,760-Speed 3006.35 samples/sec   Loss 9.6339   LearningRate 0.0440   Epoch: 6   Global Step: 83690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:37:29,119-Speed 3050.09 samples/sec   Loss 9.5112   LearningRate 0.0440   Epoch: 6   Global Step: 83700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:37:32,486-Speed 3042.62 samples/sec   Loss 9.6515   LearningRate 0.0440   Epoch: 6   Global Step: 83710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:37:35,914-Speed 2987.66 samples/sec   Loss 9.6531   LearningRate 0.0440   Epoch: 6   Global Step: 83720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:37:39,358-Speed 2974.64 samples/sec   Loss 9.6100   LearningRate 0.0440   Epoch: 6   Global Step: 83730   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:37:42,730-Speed 3037.27 samples/sec   Loss 9.5506   LearningRate 0.0439   Epoch: 6   Global Step: 83740   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:37:46,191-Speed 2959.38 samples/sec   Loss 9.6223   LearningRate 0.0439   Epoch: 6   Global Step: 83750   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:37:49,650-Speed 2961.46 samples/sec   Loss 9.7890   LearningRate 0.0439   Epoch: 6   Global Step: 83760   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:37:53,031-Speed 3029.21 samples/sec   Loss 9.7306   LearningRate 0.0439   Epoch: 6   Global Step: 83770   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:37:56,347-Speed 3088.70 samples/sec   Loss 9.6215   LearningRate 0.0439   Epoch: 6   Global Step: 83780   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:37:59,801-Speed 2966.07 samples/sec   Loss 9.7709   LearningRate 0.0439   Epoch: 6   Global Step: 83790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:03,196-Speed 3016.89 samples/sec   Loss 9.6865   LearningRate 0.0439   Epoch: 6   Global Step: 83800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:06,543-Speed 3060.42 samples/sec   Loss 9.7566   LearningRate 0.0439   Epoch: 6   Global Step: 83810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:11,086-Speed 2254.22 samples/sec   Loss 9.7235   LearningRate 0.0439   Epoch: 6   Global Step: 83820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:14,428-Speed 3065.59 samples/sec   Loss 9.5953   LearningRate 0.0439   Epoch: 6   Global Step: 83830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:17,855-Speed 2988.89 samples/sec   Loss 9.7141   LearningRate 0.0439   Epoch: 6   Global Step: 83840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:21,196-Speed 3065.82 samples/sec   Loss 9.5991   LearningRate 0.0439   Epoch: 6   Global Step: 83850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:24,558-Speed 3047.84 samples/sec   Loss 9.8023   LearningRate 0.0439   Epoch: 6   Global Step: 83860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:27,985-Speed 2988.78 samples/sec   Loss 9.5281   LearningRate 0.0439   Epoch: 6   Global Step: 83870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:31,346-Speed 3047.86 samples/sec   Loss 9.8355   LearningRate 0.0439   Epoch: 6   Global Step: 83880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:34,750-Speed 3009.32 samples/sec   Loss 9.6993   LearningRate 0.0439   Epoch: 6   Global Step: 83890   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:38:38,082-Speed 3075.03 samples/sec   Loss 9.8579   LearningRate 0.0439   Epoch: 6   Global Step: 83900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:41,551-Speed 2952.83 samples/sec   Loss 9.7327   LearningRate 0.0439   Epoch: 6   Global Step: 83910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:45,050-Speed 2927.20 samples/sec   Loss 9.7982   LearningRate 0.0438   Epoch: 6   Global Step: 83920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:48,481-Speed 2985.83 samples/sec   Loss 9.6218   LearningRate 0.0438   Epoch: 6   Global Step: 83930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:51,876-Speed 3017.59 samples/sec   Loss 9.6533   LearningRate 0.0438   Epoch: 6   Global Step: 83940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:55,229-Speed 3054.42 samples/sec   Loss 9.7785   LearningRate 0.0438   Epoch: 6   Global Step: 83950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:38:58,598-Speed 3039.96 samples/sec   Loss 9.6088   LearningRate 0.0438   Epoch: 6   Global Step: 83960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:39:01,960-Speed 3047.05 samples/sec   Loss 9.5784   LearningRate 0.0438   Epoch: 6   Global Step: 83970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:39:05,334-Speed 3035.60 samples/sec   Loss 9.6690   LearningRate 0.0438   Epoch: 6   Global Step: 83980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:39:08,742-Speed 3006.33 samples/sec   Loss 9.5447   LearningRate 0.0438   Epoch: 6   Global Step: 83990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:39:12,206-Speed 2956.14 samples/sec   Loss 9.6327   LearningRate 0.0438   Epoch: 6   Global Step: 84000   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:39:15,520-Speed 3091.11 samples/sec   Loss 9.6959   LearningRate 0.0438   Epoch: 6   Global Step: 84010   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:39:18,907-Speed 3024.08 samples/sec   Loss 9.5023   LearningRate 0.0438   Epoch: 6   Global Step: 84020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:39:22,326-Speed 2996.33 samples/sec   Loss 9.7550   LearningRate 0.0438   Epoch: 6   Global Step: 84030   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:39:25,775-Speed 2970.04 samples/sec   Loss 9.6975   LearningRate 0.0438   Epoch: 6   Global Step: 84040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:39:29,167-Speed 3019.36 samples/sec   Loss 9.7138   LearningRate 0.0438   Epoch: 6   Global Step: 84050   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:39:32,532-Speed 3044.08 samples/sec   Loss 9.5024   LearningRate 0.0438   Epoch: 6   Global Step: 84060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:39:35,904-Speed 3036.97 samples/sec   Loss 9.6825   LearningRate 0.0438   Epoch: 6   Global Step: 84070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:39:39,356-Speed 2968.01 samples/sec   Loss 9.5656   LearningRate 0.0438   Epoch: 6   Global Step: 84080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:39:42,793-Speed 2980.34 samples/sec   Loss 9.6268   LearningRate 0.0438   Epoch: 6   Global Step: 84090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:39:46,208-Speed 2999.27 samples/sec   Loss 9.6480   LearningRate 0.0438   Epoch: 6   Global Step: 84100   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-04-27 09:39:49,543-Speed 3071.30 samples/sec   Loss 9.7119   LearningRate 0.0437   Epoch: 6   Global Step: 84110   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:39:53,048-Speed 2921.85 samples/sec   Loss 9.5932   LearningRate 0.0437   Epoch: 6   Global Step: 84120   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:39:56,583-Speed 2898.05 samples/sec   Loss 9.7702   LearningRate 0.0437   Epoch: 6   Global Step: 84130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:40:00,034-Speed 2967.82 samples/sec   Loss 9.7324   LearningRate 0.0437   Epoch: 6   Global Step: 84140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:40:03,485-Speed 2968.04 samples/sec   Loss 9.6847   LearningRate 0.0437   Epoch: 6   Global Step: 84150   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:40:06,859-Speed 3036.05 samples/sec   Loss 9.6227   LearningRate 0.0437   Epoch: 6   Global Step: 84160   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:40:10,211-Speed 3055.99 samples/sec   Loss 9.5896   LearningRate 0.0437   Epoch: 6   Global Step: 84170   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:40:13,631-Speed 2995.11 samples/sec   Loss 9.6982   LearningRate 0.0437   Epoch: 6   Global Step: 84180   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:40:16,940-Speed 3094.97 samples/sec   Loss 9.7074   LearningRate 0.0437   Epoch: 6   Global Step: 84190   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:40:20,278-Speed 3069.20 samples/sec   Loss 9.7084   LearningRate 0.0437   Epoch: 6   Global Step: 84200   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:40:23,681-Speed 3010.10 samples/sec   Loss 9.8006   LearningRate 0.0437   Epoch: 6   Global Step: 84210   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:40:27,088-Speed 3006.81 samples/sec   Loss 9.5589   LearningRate 0.0437   Epoch: 6   Global Step: 84220   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:40:30,512-Speed 2991.37 samples/sec   Loss 9.5839   LearningRate 0.0437   Epoch: 6   Global Step: 84230   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:40:33,826-Speed 3090.60 samples/sec   Loss 9.5900   LearningRate 0.0437   Epoch: 6   Global Step: 84240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:40:37,215-Speed 3022.29 samples/sec   Loss 9.7184   LearningRate 0.0437   Epoch: 6   Global Step: 84250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:40:40,589-Speed 3036.55 samples/sec   Loss 9.6812   LearningRate 0.0437   Epoch: 6   Global Step: 84260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:40:44,012-Speed 2992.28 samples/sec   Loss 9.7146   LearningRate 0.0437   Epoch: 6   Global Step: 84270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:40:47,347-Speed 3070.83 samples/sec   Loss 9.5794   LearningRate 0.0437   Epoch: 6   Global Step: 84280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:40:50,717-Speed 3039.45 samples/sec   Loss 9.6097   LearningRate 0.0437   Epoch: 6   Global Step: 84290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:40:54,123-Speed 3007.75 samples/sec   Loss 9.6741   LearningRate 0.0436   Epoch: 6   Global Step: 84300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:40:57,556-Speed 2983.45 samples/sec   Loss 9.6189   LearningRate 0.0436   Epoch: 6   Global Step: 84310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:41:00,945-Speed 3022.49 samples/sec   Loss 9.5700   LearningRate 0.0436   Epoch: 6   Global Step: 84320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:41:04,356-Speed 3002.94 samples/sec   Loss 9.5248   LearningRate 0.0436   Epoch: 6   Global Step: 84330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:41:07,692-Speed 3070.87 samples/sec   Loss 9.5672   LearningRate 0.0436   Epoch: 6   Global Step: 84340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:41:11,107-Speed 2998.93 samples/sec   Loss 9.7231   LearningRate 0.0436   Epoch: 6   Global Step: 84350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:41:14,518-Speed 3003.26 samples/sec   Loss 9.7142   LearningRate 0.0436   Epoch: 6   Global Step: 84360   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:41:17,879-Speed 3047.08 samples/sec   Loss 9.6925   LearningRate 0.0436   Epoch: 6   Global Step: 84370   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:41:21,194-Speed 3089.99 samples/sec   Loss 9.5902   LearningRate 0.0436   Epoch: 6   Global Step: 84380   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:41:24,559-Speed 3043.67 samples/sec   Loss 9.6896   LearningRate 0.0436   Epoch: 6   Global Step: 84390   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:41:27,896-Speed 3069.58 samples/sec   Loss 9.6415   LearningRate 0.0436   Epoch: 6   Global Step: 84400   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:41:31,265-Speed 3041.10 samples/sec   Loss 9.6527   LearningRate 0.0436   Epoch: 6   Global Step: 84410   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:41:34,667-Speed 3010.10 samples/sec   Loss 9.6579   LearningRate 0.0436   Epoch: 6   Global Step: 84420   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:41:38,112-Speed 2974.65 samples/sec   Loss 9.6135   LearningRate 0.0436   Epoch: 6   Global Step: 84430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:41:41,448-Speed 3069.54 samples/sec   Loss 9.6205   LearningRate 0.0436   Epoch: 6   Global Step: 84440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:41:44,794-Speed 3061.63 samples/sec   Loss 9.5928   LearningRate 0.0436   Epoch: 6   Global Step: 84450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:41:48,247-Speed 2966.89 samples/sec   Loss 9.7240   LearningRate 0.0436   Epoch: 6   Global Step: 84460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:41:51,580-Speed 3072.38 samples/sec   Loss 9.6714   LearningRate 0.0436   Epoch: 6   Global Step: 84470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:41:54,899-Speed 3086.07 samples/sec   Loss 9.7174   LearningRate 0.0436   Epoch: 6   Global Step: 84480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:41:58,320-Speed 2994.54 samples/sec   Loss 9.8488   LearningRate 0.0435   Epoch: 6   Global Step: 84490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:01,748-Speed 2988.00 samples/sec   Loss 9.5659   LearningRate 0.0435   Epoch: 6   Global Step: 84500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:05,101-Speed 3055.15 samples/sec   Loss 9.6686   LearningRate 0.0435   Epoch: 6   Global Step: 84510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:08,441-Speed 3066.60 samples/sec   Loss 9.7264   LearningRate 0.0435   Epoch: 6   Global Step: 84520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:11,769-Speed 3077.59 samples/sec   Loss 9.7019   LearningRate 0.0435   Epoch: 6   Global Step: 84530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:15,144-Speed 3035.61 samples/sec   Loss 9.5160   LearningRate 0.0435   Epoch: 6   Global Step: 84540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:18,519-Speed 3034.46 samples/sec   Loss 9.6842   LearningRate 0.0435   Epoch: 6   Global Step: 84550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:21,976-Speed 2962.46 samples/sec   Loss 9.6775   LearningRate 0.0435   Epoch: 6   Global Step: 84560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:25,375-Speed 3013.73 samples/sec   Loss 9.8331   LearningRate 0.0435   Epoch: 6   Global Step: 84570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:28,817-Speed 2975.67 samples/sec   Loss 9.7217   LearningRate 0.0435   Epoch: 6   Global Step: 84580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:32,197-Speed 3030.62 samples/sec   Loss 9.5682   LearningRate 0.0435   Epoch: 6   Global Step: 84590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:35,558-Speed 3047.49 samples/sec   Loss 9.5525   LearningRate 0.0435   Epoch: 6   Global Step: 84600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:38,974-Speed 2998.30 samples/sec   Loss 9.7469   LearningRate 0.0435   Epoch: 6   Global Step: 84610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:42,407-Speed 2984.40 samples/sec   Loss 9.6323   LearningRate 0.0435   Epoch: 6   Global Step: 84620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:45,804-Speed 3015.19 samples/sec   Loss 9.7090   LearningRate 0.0435   Epoch: 6   Global Step: 84630   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:42:49,168-Speed 3044.46 samples/sec   Loss 9.6615   LearningRate 0.0435   Epoch: 6   Global Step: 84640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:52,581-Speed 3000.89 samples/sec   Loss 9.4822   LearningRate 0.0435   Epoch: 6   Global Step: 84650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:55,925-Speed 3063.45 samples/sec   Loss 9.6248   LearningRate 0.0435   Epoch: 6   Global Step: 84660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:42:59,322-Speed 3015.21 samples/sec   Loss 9.5806   LearningRate 0.0434   Epoch: 6   Global Step: 84670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:02,693-Speed 3038.79 samples/sec   Loss 9.7796   LearningRate 0.0434   Epoch: 6   Global Step: 84680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:06,049-Speed 3052.45 samples/sec   Loss 9.5150   LearningRate 0.0434   Epoch: 6   Global Step: 84690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:09,433-Speed 3026.48 samples/sec   Loss 9.6263   LearningRate 0.0434   Epoch: 6   Global Step: 84700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:12,804-Speed 3038.94 samples/sec   Loss 9.8756   LearningRate 0.0434   Epoch: 6   Global Step: 84710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:16,135-Speed 3075.25 samples/sec   Loss 9.5334   LearningRate 0.0434   Epoch: 6   Global Step: 84720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:19,551-Speed 2997.81 samples/sec   Loss 9.6695   LearningRate 0.0434   Epoch: 6   Global Step: 84730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:22,902-Speed 3056.97 samples/sec   Loss 9.6893   LearningRate 0.0434   Epoch: 6   Global Step: 84740   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:43:26,325-Speed 2992.33 samples/sec   Loss 9.7348   LearningRate 0.0434   Epoch: 6   Global Step: 84750   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:43:29,694-Speed 3040.17 samples/sec   Loss 9.6177   LearningRate 0.0434   Epoch: 6   Global Step: 84760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:33,066-Speed 3037.96 samples/sec   Loss 9.6835   LearningRate 0.0434   Epoch: 6   Global Step: 84770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:36,458-Speed 3019.61 samples/sec   Loss 9.5989   LearningRate 0.0434   Epoch: 6   Global Step: 84780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:39,863-Speed 3007.65 samples/sec   Loss 9.5806   LearningRate 0.0434   Epoch: 6   Global Step: 84790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:43,239-Speed 3035.03 samples/sec   Loss 9.7594   LearningRate 0.0434   Epoch: 6   Global Step: 84800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:46,580-Speed 3065.69 samples/sec   Loss 9.7063   LearningRate 0.0434   Epoch: 6   Global Step: 84810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:49,915-Speed 3070.84 samples/sec   Loss 9.6170   LearningRate 0.0434   Epoch: 6   Global Step: 84820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:53,394-Speed 2944.49 samples/sec   Loss 9.4656   LearningRate 0.0434   Epoch: 6   Global Step: 84830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:43:56,829-Speed 2982.14 samples/sec   Loss 9.6746   LearningRate 0.0434   Epoch: 6   Global Step: 84840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:44:00,250-Speed 2993.52 samples/sec   Loss 9.6150   LearningRate 0.0434   Epoch: 6   Global Step: 84850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:44:03,659-Speed 3004.96 samples/sec   Loss 9.5118   LearningRate 0.0433   Epoch: 6   Global Step: 84860   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:44:07,036-Speed 3033.23 samples/sec   Loss 9.4952   LearningRate 0.0433   Epoch: 6   Global Step: 84870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:44:10,472-Speed 2981.64 samples/sec   Loss 9.5364   LearningRate 0.0433   Epoch: 6   Global Step: 84880   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:44:13,848-Speed 3034.06 samples/sec   Loss 9.4527   LearningRate 0.0433   Epoch: 6   Global Step: 84890   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:44:17,261-Speed 3001.94 samples/sec   Loss 9.6063   LearningRate 0.0433   Epoch: 6   Global Step: 84900   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:44:20,636-Speed 3035.07 samples/sec   Loss 9.4846   LearningRate 0.0433   Epoch: 6   Global Step: 84910   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:44:24,029-Speed 3019.04 samples/sec   Loss 9.5990   LearningRate 0.0433   Epoch: 6   Global Step: 84920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:44:27,392-Speed 3045.34 samples/sec   Loss 9.5950   LearningRate 0.0433   Epoch: 6   Global Step: 84930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:44:30,878-Speed 2938.69 samples/sec   Loss 9.6681   LearningRate 0.0433   Epoch: 6   Global Step: 84940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:44:34,279-Speed 3011.96 samples/sec   Loss 9.6085   LearningRate 0.0433   Epoch: 6   Global Step: 84950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:44:37,673-Speed 3018.45 samples/sec   Loss 9.6572   LearningRate 0.0433   Epoch: 6   Global Step: 84960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:44:41,054-Speed 3029.31 samples/sec   Loss 9.5765   LearningRate 0.0433   Epoch: 6   Global Step: 84970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:44:44,464-Speed 3005.01 samples/sec   Loss 9.6366   LearningRate 0.0433   Epoch: 6   Global Step: 84980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:44:47,828-Speed 3044.85 samples/sec   Loss 9.5247   LearningRate 0.0433   Epoch: 6   Global Step: 84990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:44:51,171-Speed 3063.42 samples/sec   Loss 9.5928   LearningRate 0.0433   Epoch: 6   Global Step: 85000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:44:54,568-Speed 3015.89 samples/sec   Loss 9.7868   LearningRate 0.0433   Epoch: 6   Global Step: 85010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:44:57,948-Speed 3030.65 samples/sec   Loss 9.6006   LearningRate 0.0433   Epoch: 6   Global Step: 85020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:45:01,322-Speed 3035.13 samples/sec   Loss 9.9367   LearningRate 0.0433   Epoch: 6   Global Step: 85030   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:45:04,715-Speed 3018.96 samples/sec   Loss 9.6872   LearningRate 0.0433   Epoch: 6   Global Step: 85040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:45:08,079-Speed 3045.97 samples/sec   Loss 9.5724   LearningRate 0.0432   Epoch: 6   Global Step: 85050   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:45:11,483-Speed 3009.19 samples/sec   Loss 9.4627   LearningRate 0.0432   Epoch: 6   Global Step: 85060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:45:14,872-Speed 3022.73 samples/sec   Loss 9.5306   LearningRate 0.0432   Epoch: 6   Global Step: 85070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:45:18,247-Speed 3034.41 samples/sec   Loss 9.6917   LearningRate 0.0432   Epoch: 6   Global Step: 85080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:45:21,656-Speed 3004.76 samples/sec   Loss 9.6531   LearningRate 0.0432   Epoch: 6   Global Step: 85090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:45:25,054-Speed 3014.41 samples/sec   Loss 9.7032   LearningRate 0.0432   Epoch: 6   Global Step: 85100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:45:28,435-Speed 3030.07 samples/sec   Loss 9.6191   LearningRate 0.0432   Epoch: 6   Global Step: 85110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:45:31,867-Speed 2984.51 samples/sec   Loss 9.5223   LearningRate 0.0432   Epoch: 6   Global Step: 85120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:45:35,243-Speed 3033.58 samples/sec   Loss 9.5096   LearningRate 0.0432   Epoch: 6   Global Step: 85130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:45:38,598-Speed 3053.51 samples/sec   Loss 9.5034   LearningRate 0.0432   Epoch: 6   Global Step: 85140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:45:42,048-Speed 2969.68 samples/sec   Loss 9.6262   LearningRate 0.0432   Epoch: 6   Global Step: 85150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:45:45,472-Speed 2991.35 samples/sec   Loss 9.7299   LearningRate 0.0432   Epoch: 6   Global Step: 85160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:45:48,923-Speed 2967.84 samples/sec   Loss 9.5964   LearningRate 0.0432   Epoch: 6   Global Step: 85170   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:45:52,357-Speed 2983.14 samples/sec   Loss 9.5329   LearningRate 0.0432   Epoch: 6   Global Step: 85180   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:45:55,734-Speed 3032.64 samples/sec   Loss 9.5358   LearningRate 0.0432   Epoch: 6   Global Step: 85190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:45:59,120-Speed 3025.17 samples/sec   Loss 9.5938   LearningRate 0.0432   Epoch: 6   Global Step: 85200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:46:02,573-Speed 2966.55 samples/sec   Loss 9.6750   LearningRate 0.0432   Epoch: 6   Global Step: 85210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:46:05,935-Speed 3047.13 samples/sec   Loss 9.6797   LearningRate 0.0432   Epoch: 6   Global Step: 85220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:46:09,360-Speed 2990.54 samples/sec   Loss 9.6723   LearningRate 0.0432   Epoch: 6   Global Step: 85230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:46:12,748-Speed 3022.98 samples/sec   Loss 9.6677   LearningRate 0.0431   Epoch: 6   Global Step: 85240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:46:16,101-Speed 3054.58 samples/sec   Loss 9.4760   LearningRate 0.0431   Epoch: 6   Global Step: 85250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:46:19,437-Speed 3070.59 samples/sec   Loss 9.6877   LearningRate 0.0431   Epoch: 6   Global Step: 85260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:46:22,830-Speed 3019.16 samples/sec   Loss 9.5035   LearningRate 0.0431   Epoch: 6   Global Step: 85270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:46:26,208-Speed 3032.12 samples/sec   Loss 9.5607   LearningRate 0.0431   Epoch: 6   Global Step: 85280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:46:29,660-Speed 2969.63 samples/sec   Loss 9.5887   LearningRate 0.0431   Epoch: 6   Global Step: 85290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:46:33,125-Speed 2956.21 samples/sec   Loss 9.4774   LearningRate 0.0431   Epoch: 6   Global Step: 85300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:46:36,513-Speed 3022.91 samples/sec   Loss 9.6588   LearningRate 0.0431   Epoch: 6   Global Step: 85310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:46:39,935-Speed 2993.50 samples/sec   Loss 9.7193   LearningRate 0.0431   Epoch: 6   Global Step: 85320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:46:43,393-Speed 2962.44 samples/sec   Loss 9.7163   LearningRate 0.0431   Epoch: 6   Global Step: 85330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:46:46,830-Speed 2980.29 samples/sec   Loss 9.6185   LearningRate 0.0431   Epoch: 6   Global Step: 85340   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:46:50,252-Speed 2992.93 samples/sec   Loss 9.6576   LearningRate 0.0431   Epoch: 6   Global Step: 85350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:46:53,672-Speed 2995.62 samples/sec   Loss 9.5768   LearningRate 0.0431   Epoch: 6   Global Step: 85360   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:46:57,047-Speed 3034.47 samples/sec   Loss 9.6686   LearningRate 0.0431   Epoch: 6   Global Step: 85370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:00,427-Speed 3030.02 samples/sec   Loss 9.4105   LearningRate 0.0431   Epoch: 6   Global Step: 85380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:03,807-Speed 3030.47 samples/sec   Loss 9.5970   LearningRate 0.0431   Epoch: 6   Global Step: 85390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:07,306-Speed 2928.08 samples/sec   Loss 9.5237   LearningRate 0.0431   Epoch: 6   Global Step: 85400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:10,729-Speed 2992.15 samples/sec   Loss 9.5825   LearningRate 0.0431   Epoch: 6   Global Step: 85410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:14,131-Speed 3011.06 samples/sec   Loss 9.5397   LearningRate 0.0431   Epoch: 6   Global Step: 85420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:17,503-Speed 3037.60 samples/sec   Loss 9.6539   LearningRate 0.0430   Epoch: 6   Global Step: 85430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:20,879-Speed 3033.76 samples/sec   Loss 9.5997   LearningRate 0.0430   Epoch: 6   Global Step: 85440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:24,241-Speed 3046.06 samples/sec   Loss 9.4866   LearningRate 0.0430   Epoch: 6   Global Step: 85450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:27,626-Speed 3026.20 samples/sec   Loss 9.5535   LearningRate 0.0430   Epoch: 6   Global Step: 85460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:31,024-Speed 3014.35 samples/sec   Loss 9.5141   LearningRate 0.0430   Epoch: 6   Global Step: 85470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:34,451-Speed 2988.75 samples/sec   Loss 9.5009   LearningRate 0.0430   Epoch: 6   Global Step: 85480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:37,892-Speed 2977.82 samples/sec   Loss 9.6659   LearningRate 0.0430   Epoch: 6   Global Step: 85490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:41,292-Speed 3013.04 samples/sec   Loss 9.5682   LearningRate 0.0430   Epoch: 6   Global Step: 85500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:44,692-Speed 3012.13 samples/sec   Loss 9.6688   LearningRate 0.0430   Epoch: 6   Global Step: 85510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:48,160-Speed 2954.16 samples/sec   Loss 9.4740   LearningRate 0.0430   Epoch: 6   Global Step: 85520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:51,573-Speed 3000.57 samples/sec   Loss 9.5691   LearningRate 0.0430   Epoch: 6   Global Step: 85530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:54,942-Speed 3040.33 samples/sec   Loss 9.4302   LearningRate 0.0430   Epoch: 6   Global Step: 85540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:47:58,415-Speed 2949.44 samples/sec   Loss 9.5904   LearningRate 0.0430   Epoch: 6   Global Step: 85550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:01,841-Speed 2990.09 samples/sec   Loss 9.4156   LearningRate 0.0430   Epoch: 6   Global Step: 85560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:05,184-Speed 3064.40 samples/sec   Loss 9.6464   LearningRate 0.0430   Epoch: 6   Global Step: 85570   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:48:08,572-Speed 3022.59 samples/sec   Loss 9.6917   LearningRate 0.0430   Epoch: 6   Global Step: 85580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:11,926-Speed 3054.10 samples/sec   Loss 9.4380   LearningRate 0.0430   Epoch: 6   Global Step: 85590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:15,347-Speed 2994.11 samples/sec   Loss 9.5427   LearningRate 0.0430   Epoch: 6   Global Step: 85600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:18,733-Speed 3025.29 samples/sec   Loss 9.6990   LearningRate 0.0430   Epoch: 6   Global Step: 85610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:22,123-Speed 3021.63 samples/sec   Loss 9.5881   LearningRate 0.0429   Epoch: 6   Global Step: 85620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:25,571-Speed 2970.54 samples/sec   Loss 9.6501   LearningRate 0.0429   Epoch: 6   Global Step: 85630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:28,970-Speed 3013.29 samples/sec   Loss 9.4386   LearningRate 0.0429   Epoch: 6   Global Step: 85640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:32,317-Speed 3060.44 samples/sec   Loss 9.8613   LearningRate 0.0429   Epoch: 6   Global Step: 85650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:35,678-Speed 3047.99 samples/sec   Loss 9.5380   LearningRate 0.0429   Epoch: 6   Global Step: 85660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:39,033-Speed 3052.87 samples/sec   Loss 9.6746   LearningRate 0.0429   Epoch: 6   Global Step: 85670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:42,374-Speed 3065.67 samples/sec   Loss 9.5556   LearningRate 0.0429   Epoch: 6   Global Step: 85680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:48:45,797-Speed 2992.53 samples/sec   Loss 9.5032   LearningRate 0.0429   Epoch: 6   Global Step: 85690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:48:49,132-Speed 3071.29 samples/sec   Loss 9.6843   LearningRate 0.0429   Epoch: 6   Global Step: 85700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:52,600-Speed 2953.47 samples/sec   Loss 9.5519   LearningRate 0.0429   Epoch: 6   Global Step: 85710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:55,999-Speed 3013.52 samples/sec   Loss 9.4998   LearningRate 0.0429   Epoch: 6   Global Step: 85720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:48:59,366-Speed 3042.14 samples/sec   Loss 9.6742   LearningRate 0.0429   Epoch: 6   Global Step: 85730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:49:02,734-Speed 3041.20 samples/sec   Loss 9.4778   LearningRate 0.0429   Epoch: 6   Global Step: 85740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:49:06,076-Speed 3065.48 samples/sec   Loss 9.5190   LearningRate 0.0429   Epoch: 6   Global Step: 85750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:49:09,548-Speed 2949.66 samples/sec   Loss 9.6039   LearningRate 0.0429   Epoch: 6   Global Step: 85760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:49:12,949-Speed 3013.05 samples/sec   Loss 9.4493   LearningRate 0.0429   Epoch: 6   Global Step: 85770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:49:16,364-Speed 2999.66 samples/sec   Loss 9.5903   LearningRate 0.0429   Epoch: 6   Global Step: 85780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:49:19,750-Speed 3024.26 samples/sec   Loss 9.6695   LearningRate 0.0429   Epoch: 6   Global Step: 85790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:49:23,157-Speed 3006.95 samples/sec   Loss 9.6194   LearningRate 0.0429   Epoch: 6   Global Step: 85800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:49:26,508-Speed 3056.31 samples/sec   Loss 9.4967   LearningRate 0.0428   Epoch: 6   Global Step: 85810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:49:29,938-Speed 2986.60 samples/sec   Loss 9.5300   LearningRate 0.0428   Epoch: 6   Global Step: 85820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:49:33,322-Speed 3026.29 samples/sec   Loss 9.5478   LearningRate 0.0428   Epoch: 6   Global Step: 85830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:49:36,735-Speed 3001.24 samples/sec   Loss 9.4669   LearningRate 0.0428   Epoch: 6   Global Step: 85840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:49:40,130-Speed 3017.53 samples/sec   Loss 9.6346   LearningRate 0.0428   Epoch: 6   Global Step: 85850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:49:43,597-Speed 2954.11 samples/sec   Loss 9.5330   LearningRate 0.0428   Epoch: 6   Global Step: 85860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:49:46,969-Speed 3037.64 samples/sec   Loss 9.5319   LearningRate 0.0428   Epoch: 6   Global Step: 85870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:49:50,372-Speed 3010.42 samples/sec   Loss 9.4257   LearningRate 0.0428   Epoch: 6   Global Step: 85880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:49:53,732-Speed 3048.69 samples/sec   Loss 9.6136   LearningRate 0.0428   Epoch: 6   Global Step: 85890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:49:57,082-Speed 3057.22 samples/sec   Loss 9.6099   LearningRate 0.0428   Epoch: 6   Global Step: 85900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:50:00,447-Speed 3043.76 samples/sec   Loss 9.5695   LearningRate 0.0428   Epoch: 6   Global Step: 85910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:03,890-Speed 2974.84 samples/sec   Loss 9.5542   LearningRate 0.0428   Epoch: 6   Global Step: 85920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:07,245-Speed 3053.30 samples/sec   Loss 9.5825   LearningRate 0.0428   Epoch: 6   Global Step: 85930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:10,641-Speed 3016.74 samples/sec   Loss 9.6101   LearningRate 0.0428   Epoch: 6   Global Step: 85940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:14,075-Speed 2982.49 samples/sec   Loss 9.4433   LearningRate 0.0428   Epoch: 6   Global Step: 85950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:17,455-Speed 3030.95 samples/sec   Loss 9.4958   LearningRate 0.0428   Epoch: 6   Global Step: 85960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:20,860-Speed 3007.47 samples/sec   Loss 9.5727   LearningRate 0.0428   Epoch: 6   Global Step: 85970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:24,197-Speed 3069.97 samples/sec   Loss 9.5173   LearningRate 0.0428   Epoch: 6   Global Step: 85980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:27,621-Speed 2991.64 samples/sec   Loss 9.7483   LearningRate 0.0428   Epoch: 6   Global Step: 85990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:31,092-Speed 2951.18 samples/sec   Loss 9.5701   LearningRate 0.0427   Epoch: 6   Global Step: 86000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:34,437-Speed 3061.79 samples/sec   Loss 9.4408   LearningRate 0.0427   Epoch: 6   Global Step: 86010   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:50:37,788-Speed 3058.02 samples/sec   Loss 9.7122   LearningRate 0.0427   Epoch: 6   Global Step: 86020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:41,157-Speed 3040.33 samples/sec   Loss 9.5068   LearningRate 0.0427   Epoch: 6   Global Step: 86030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:44,527-Speed 3038.91 samples/sec   Loss 9.6277   LearningRate 0.0427   Epoch: 6   Global Step: 86040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:47,915-Speed 3023.79 samples/sec   Loss 9.5735   LearningRate 0.0427   Epoch: 6   Global Step: 86050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:51,412-Speed 2928.36 samples/sec   Loss 9.4444   LearningRate 0.0427   Epoch: 6   Global Step: 86060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:54,842-Speed 2986.74 samples/sec   Loss 9.5986   LearningRate 0.0427   Epoch: 6   Global Step: 86070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:50:58,196-Speed 3053.46 samples/sec   Loss 9.4655   LearningRate 0.0427   Epoch: 6   Global Step: 86080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:01,640-Speed 2973.92 samples/sec   Loss 9.3944   LearningRate 0.0427   Epoch: 6   Global Step: 86090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:04,998-Speed 3050.80 samples/sec   Loss 9.4877   LearningRate 0.0427   Epoch: 6   Global Step: 86100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:08,356-Speed 3049.89 samples/sec   Loss 9.7201   LearningRate 0.0427   Epoch: 6   Global Step: 86110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:11,684-Speed 3078.37 samples/sec   Loss 9.4504   LearningRate 0.0427   Epoch: 6   Global Step: 86120   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:51:15,060-Speed 3033.89 samples/sec   Loss 9.4921   LearningRate 0.0427   Epoch: 6   Global Step: 86130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:18,508-Speed 2970.70 samples/sec   Loss 9.5714   LearningRate 0.0427   Epoch: 6   Global Step: 86140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:21,907-Speed 3013.76 samples/sec   Loss 9.6278   LearningRate 0.0427   Epoch: 6   Global Step: 86150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:25,343-Speed 2981.01 samples/sec   Loss 9.4617   LearningRate 0.0427   Epoch: 6   Global Step: 86160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:28,789-Speed 2972.53 samples/sec   Loss 9.4949   LearningRate 0.0427   Epoch: 6   Global Step: 86170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:32,233-Speed 2973.96 samples/sec   Loss 9.5787   LearningRate 0.0427   Epoch: 6   Global Step: 86180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:35,650-Speed 2998.01 samples/sec   Loss 9.5475   LearningRate 0.0426   Epoch: 6   Global Step: 86190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:39,098-Speed 2970.91 samples/sec   Loss 9.5615   LearningRate 0.0426   Epoch: 6   Global Step: 86200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:42,517-Speed 2995.62 samples/sec   Loss 9.4731   LearningRate 0.0426   Epoch: 6   Global Step: 86210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:45,965-Speed 2971.17 samples/sec   Loss 9.4520   LearningRate 0.0426   Epoch: 6   Global Step: 86220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:49,363-Speed 3014.73 samples/sec   Loss 9.5283   LearningRate 0.0426   Epoch: 6   Global Step: 86230   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:51:52,742-Speed 3031.38 samples/sec   Loss 9.6252   LearningRate 0.0426   Epoch: 6   Global Step: 86240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:51:56,088-Speed 3060.67 samples/sec   Loss 9.5746   LearningRate 0.0426   Epoch: 6   Global Step: 86250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:51:59,519-Speed 2985.47 samples/sec   Loss 9.3814   LearningRate 0.0426   Epoch: 6   Global Step: 86260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:02,994-Speed 2947.96 samples/sec   Loss 9.6302   LearningRate 0.0426   Epoch: 6   Global Step: 86270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:06,413-Speed 2996.07 samples/sec   Loss 9.4963   LearningRate 0.0426   Epoch: 6   Global Step: 86280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:09,764-Speed 3056.03 samples/sec   Loss 9.4970   LearningRate 0.0426   Epoch: 6   Global Step: 86290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:13,149-Speed 3026.26 samples/sec   Loss 9.6056   LearningRate 0.0426   Epoch: 6   Global Step: 86300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:16,541-Speed 3020.17 samples/sec   Loss 9.4848   LearningRate 0.0426   Epoch: 6   Global Step: 86310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:19,888-Speed 3059.90 samples/sec   Loss 9.5863   LearningRate 0.0426   Epoch: 6   Global Step: 86320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:23,352-Speed 2956.93 samples/sec   Loss 9.4429   LearningRate 0.0426   Epoch: 6   Global Step: 86330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:26,766-Speed 3000.71 samples/sec   Loss 9.4415   LearningRate 0.0426   Epoch: 6   Global Step: 86340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:30,152-Speed 3025.19 samples/sec   Loss 9.5100   LearningRate 0.0426   Epoch: 6   Global Step: 86350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:52:33,481-Speed 3077.13 samples/sec   Loss 9.5028   LearningRate 0.0426   Epoch: 6   Global Step: 86360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:36,812-Speed 3074.53 samples/sec   Loss 9.4140   LearningRate 0.0426   Epoch: 6   Global Step: 86370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:40,175-Speed 3045.91 samples/sec   Loss 9.5865   LearningRate 0.0425   Epoch: 6   Global Step: 86380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:43,506-Speed 3075.10 samples/sec   Loss 9.5508   LearningRate 0.0425   Epoch: 6   Global Step: 86390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:46,948-Speed 2975.95 samples/sec   Loss 9.5463   LearningRate 0.0425   Epoch: 6   Global Step: 86400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:50,324-Speed 3034.56 samples/sec   Loss 9.5845   LearningRate 0.0425   Epoch: 6   Global Step: 86410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:53,657-Speed 3072.87 samples/sec   Loss 9.5141   LearningRate 0.0425   Epoch: 6   Global Step: 86420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:52:57,063-Speed 3007.64 samples/sec   Loss 9.3906   LearningRate 0.0425   Epoch: 6   Global Step: 86430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:53:00,494-Speed 2984.85 samples/sec   Loss 9.4743   LearningRate 0.0425   Epoch: 6   Global Step: 86440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:53:03,923-Speed 2987.45 samples/sec   Loss 9.5652   LearningRate 0.0425   Epoch: 6   Global Step: 86450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:53:07,393-Speed 2951.80 samples/sec   Loss 9.4958   LearningRate 0.0425   Epoch: 6   Global Step: 86460   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:53:10,755-Speed 3047.35 samples/sec   Loss 9.4153   LearningRate 0.0425   Epoch: 6   Global Step: 86470   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:53:14,168-Speed 3000.56 samples/sec   Loss 9.4741   LearningRate 0.0425   Epoch: 6   Global Step: 86480   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:53:17,529-Speed 3047.70 samples/sec   Loss 9.5205   LearningRate 0.0425   Epoch: 6   Global Step: 86490   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:53:20,850-Speed 3085.69 samples/sec   Loss 9.5854   LearningRate 0.0425   Epoch: 6   Global Step: 86500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:53:24,304-Speed 2966.16 samples/sec   Loss 9.4093   LearningRate 0.0425   Epoch: 6   Global Step: 86510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:53:27,750-Speed 2971.71 samples/sec   Loss 9.4119   LearningRate 0.0425   Epoch: 6   Global Step: 86520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:53:31,106-Speed 3052.80 samples/sec   Loss 9.5935   LearningRate 0.0425   Epoch: 6   Global Step: 86530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:53:34,481-Speed 3034.43 samples/sec   Loss 9.4852   LearningRate 0.0425   Epoch: 6   Global Step: 86540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:53:37,921-Speed 2977.81 samples/sec   Loss 9.4319   LearningRate 0.0425   Epoch: 6   Global Step: 86550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:53:41,320-Speed 3013.27 samples/sec   Loss 9.5248   LearningRate 0.0425   Epoch: 6   Global Step: 86560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:53:44,796-Speed 2946.53 samples/sec   Loss 9.5539   LearningRate 0.0424   Epoch: 6   Global Step: 86570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:53:48,214-Speed 2997.60 samples/sec   Loss 9.3987   LearningRate 0.0424   Epoch: 6   Global Step: 86580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:53:51,599-Speed 3025.73 samples/sec   Loss 9.3548   LearningRate 0.0424   Epoch: 6   Global Step: 86590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:53:55,029-Speed 2986.16 samples/sec   Loss 9.4495   LearningRate 0.0424   Epoch: 6   Global Step: 86600   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:53:58,398-Speed 3040.06 samples/sec   Loss 9.5100   LearningRate 0.0424   Epoch: 6   Global Step: 86610   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:54:01,771-Speed 3036.61 samples/sec   Loss 9.5515   LearningRate 0.0424   Epoch: 6   Global Step: 86620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:05,075-Speed 3100.01 samples/sec   Loss 9.6233   LearningRate 0.0424   Epoch: 6   Global Step: 86630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:08,447-Speed 3038.25 samples/sec   Loss 9.4303   LearningRate 0.0424   Epoch: 6   Global Step: 86640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:11,804-Speed 3051.07 samples/sec   Loss 9.5319   LearningRate 0.0424   Epoch: 6   Global Step: 86650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:15,129-Speed 3079.86 samples/sec   Loss 9.4896   LearningRate 0.0424   Epoch: 6   Global Step: 86660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:18,503-Speed 3036.66 samples/sec   Loss 9.5034   LearningRate 0.0424   Epoch: 6   Global Step: 86670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:21,902-Speed 3013.05 samples/sec   Loss 9.4286   LearningRate 0.0424   Epoch: 6   Global Step: 86680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:25,290-Speed 3024.26 samples/sec   Loss 9.5582   LearningRate 0.0424   Epoch: 6   Global Step: 86690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:28,674-Speed 3026.19 samples/sec   Loss 9.4728   LearningRate 0.0424   Epoch: 6   Global Step: 86700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:32,070-Speed 3016.78 samples/sec   Loss 9.4160   LearningRate 0.0424   Epoch: 6   Global Step: 86710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:35,414-Speed 3062.58 samples/sec   Loss 9.4431   LearningRate 0.0424   Epoch: 6   Global Step: 86720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:38,769-Speed 3053.96 samples/sec   Loss 9.5210   LearningRate 0.0424   Epoch: 6   Global Step: 86730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:42,111-Speed 3064.44 samples/sec   Loss 9.6023   LearningRate 0.0424   Epoch: 6   Global Step: 86740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:45,473-Speed 3046.82 samples/sec   Loss 9.4413   LearningRate 0.0424   Epoch: 6   Global Step: 86750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:48,890-Speed 2997.30 samples/sec   Loss 9.6043   LearningRate 0.0423   Epoch: 6   Global Step: 86760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:52,340-Speed 2969.32 samples/sec   Loss 9.5068   LearningRate 0.0423   Epoch: 6   Global Step: 86770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:55,858-Speed 2911.32 samples/sec   Loss 9.4630   LearningRate 0.0423   Epoch: 6   Global Step: 86780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:54:59,283-Speed 2990.43 samples/sec   Loss 9.4134   LearningRate 0.0423   Epoch: 6   Global Step: 86790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:55:02,629-Speed 3061.84 samples/sec   Loss 9.5620   LearningRate 0.0423   Epoch: 6   Global Step: 86800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:55:06,040-Speed 3002.79 samples/sec   Loss 9.5222   LearningRate 0.0423   Epoch: 6   Global Step: 86810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:55:09,366-Speed 3086.28 samples/sec   Loss 9.4873   LearningRate 0.0423   Epoch: 6   Global Step: 86820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:55:12,783-Speed 2997.77 samples/sec   Loss 9.5895   LearningRate 0.0423   Epoch: 6   Global Step: 86830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:55:16,170-Speed 3024.73 samples/sec   Loss 9.3970   LearningRate 0.0423   Epoch: 6   Global Step: 86840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:55:19,564-Speed 3018.78 samples/sec   Loss 9.6004   LearningRate 0.0423   Epoch: 6   Global Step: 86850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:55:22,958-Speed 3017.33 samples/sec   Loss 9.4227   LearningRate 0.0423   Epoch: 6   Global Step: 86860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:55:26,335-Speed 3033.08 samples/sec   Loss 9.5377   LearningRate 0.0423   Epoch: 6   Global Step: 86870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:55:29,723-Speed 3023.52 samples/sec   Loss 9.5177   LearningRate 0.0423   Epoch: 6   Global Step: 86880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:55:33,210-Speed 2937.76 samples/sec   Loss 9.3525   LearningRate 0.0423   Epoch: 6   Global Step: 86890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:55:36,606-Speed 3015.81 samples/sec   Loss 9.3812   LearningRate 0.0423   Epoch: 6   Global Step: 86900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:55:39,991-Speed 3026.00 samples/sec   Loss 9.4342   LearningRate 0.0423   Epoch: 6   Global Step: 86910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 09:55:43,408-Speed 2997.47 samples/sec   Loss 9.4053   LearningRate 0.0423   Epoch: 6   Global Step: 86920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:55:46,802-Speed 3017.70 samples/sec   Loss 9.3387   LearningRate 0.0423   Epoch: 6   Global Step: 86930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:55:50,488-Speed 2780.26 samples/sec   Loss 9.5867   LearningRate 0.0423   Epoch: 6   Global Step: 86940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:56:21,938-Speed 325.62 samples/sec   Loss 9.0649   LearningRate 0.0422   Epoch: 7   Global Step: 86950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:56:25,354-Speed 2998.83 samples/sec   Loss 7.9646   LearningRate 0.0422   Epoch: 7   Global Step: 86960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:56:28,866-Speed 2916.85 samples/sec   Loss 7.9309   LearningRate 0.0422   Epoch: 7   Global Step: 86970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:56:32,297-Speed 2985.48 samples/sec   Loss 7.9687   LearningRate 0.0422   Epoch: 7   Global Step: 86980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:56:35,760-Speed 2958.38 samples/sec   Loss 7.8560   LearningRate 0.0422   Epoch: 7   Global Step: 86990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:56:39,275-Speed 2913.92 samples/sec   Loss 8.0329   LearningRate 0.0422   Epoch: 7   Global Step: 87000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:56:42,687-Speed 3001.61 samples/sec   Loss 7.9530   LearningRate 0.0422   Epoch: 7   Global Step: 87010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:56:46,153-Speed 2955.95 samples/sec   Loss 7.9844   LearningRate 0.0422   Epoch: 7   Global Step: 87020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:56:49,615-Speed 2958.13 samples/sec   Loss 7.9556   LearningRate 0.0422   Epoch: 7   Global Step: 87030   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:56:53,095-Speed 2943.89 samples/sec   Loss 7.9791   LearningRate 0.0422   Epoch: 7   Global Step: 87040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:56:56,673-Speed 2862.70 samples/sec   Loss 7.9496   LearningRate 0.0422   Epoch: 7   Global Step: 87050   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:57:00,060-Speed 3024.70 samples/sec   Loss 8.0040   LearningRate 0.0422   Epoch: 7   Global Step: 87060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:57:03,436-Speed 3033.67 samples/sec   Loss 7.9732   LearningRate 0.0422   Epoch: 7   Global Step: 87070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:57:06,911-Speed 2948.06 samples/sec   Loss 8.0815   LearningRate 0.0422   Epoch: 7   Global Step: 87080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:57:10,354-Speed 2974.87 samples/sec   Loss 8.0999   LearningRate 0.0422   Epoch: 7   Global Step: 87090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:57:13,704-Speed 3057.85 samples/sec   Loss 8.1002   LearningRate 0.0422   Epoch: 7   Global Step: 87100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:57:17,253-Speed 2885.66 samples/sec   Loss 8.2582   LearningRate 0.0422   Epoch: 7   Global Step: 87110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:57:20,681-Speed 2988.65 samples/sec   Loss 8.0822   LearningRate 0.0422   Epoch: 7   Global Step: 87120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:57:24,091-Speed 3003.92 samples/sec   Loss 8.0259   LearningRate 0.0422   Epoch: 7   Global Step: 87130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:57:27,714-Speed 2826.43 samples/sec   Loss 7.9177   LearningRate 0.0421   Epoch: 7   Global Step: 87140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:57:31,216-Speed 2925.26 samples/sec   Loss 8.2360   LearningRate 0.0421   Epoch: 7   Global Step: 87150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:57:34,612-Speed 3016.14 samples/sec   Loss 8.1162   LearningRate 0.0421   Epoch: 7   Global Step: 87160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:57:38,040-Speed 2988.48 samples/sec   Loss 7.9445   LearningRate 0.0421   Epoch: 7   Global Step: 87170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:57:41,456-Speed 2998.24 samples/sec   Loss 8.0810   LearningRate 0.0421   Epoch: 7   Global Step: 87180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:57:44,889-Speed 2983.64 samples/sec   Loss 8.0740   LearningRate 0.0421   Epoch: 7   Global Step: 87190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:57:48,284-Speed 3016.69 samples/sec   Loss 8.2013   LearningRate 0.0421   Epoch: 7   Global Step: 87200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:57:51,656-Speed 3037.80 samples/sec   Loss 8.1608   LearningRate 0.0421   Epoch: 7   Global Step: 87210   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:57:55,133-Speed 2945.78 samples/sec   Loss 8.1599   LearningRate 0.0421   Epoch: 7   Global Step: 87220   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:57:58,588-Speed 2965.40 samples/sec   Loss 8.1002   LearningRate 0.0421   Epoch: 7   Global Step: 87230   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:02,040-Speed 2967.34 samples/sec   Loss 8.1885   LearningRate 0.0421   Epoch: 7   Global Step: 87240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:05,517-Speed 2945.81 samples/sec   Loss 8.1173   LearningRate 0.0421   Epoch: 7   Global Step: 87250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:08,916-Speed 3013.77 samples/sec   Loss 8.2527   LearningRate 0.0421   Epoch: 7   Global Step: 87260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:12,255-Speed 3067.02 samples/sec   Loss 8.2013   LearningRate 0.0421   Epoch: 7   Global Step: 87270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:15,666-Speed 3002.81 samples/sec   Loss 8.2368   LearningRate 0.0421   Epoch: 7   Global Step: 87280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:19,158-Speed 2933.53 samples/sec   Loss 8.1870   LearningRate 0.0421   Epoch: 7   Global Step: 87290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:22,597-Speed 2978.36 samples/sec   Loss 8.2838   LearningRate 0.0421   Epoch: 7   Global Step: 87300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:25,968-Speed 3038.53 samples/sec   Loss 8.3135   LearningRate 0.0421   Epoch: 7   Global Step: 87310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:29,323-Speed 3052.83 samples/sec   Loss 8.1282   LearningRate 0.0421   Epoch: 7   Global Step: 87320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:32,745-Speed 2993.75 samples/sec   Loss 8.1738   LearningRate 0.0420   Epoch: 7   Global Step: 87330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:36,054-Speed 3095.60 samples/sec   Loss 8.1563   LearningRate 0.0420   Epoch: 7   Global Step: 87340   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:39,438-Speed 3026.89 samples/sec   Loss 8.2270   LearningRate 0.0420   Epoch: 7   Global Step: 87350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:42,862-Speed 2991.48 samples/sec   Loss 8.1117   LearningRate 0.0420   Epoch: 7   Global Step: 87360   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:46,238-Speed 3033.68 samples/sec   Loss 8.2281   LearningRate 0.0420   Epoch: 7   Global Step: 87370   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:49,652-Speed 3000.26 samples/sec   Loss 8.3919   LearningRate 0.0420   Epoch: 7   Global Step: 87380   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:58:53,025-Speed 3037.11 samples/sec   Loss 8.2595   LearningRate 0.0420   Epoch: 7   Global Step: 87390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:58:56,355-Speed 3076.25 samples/sec   Loss 8.3148   LearningRate 0.0420   Epoch: 7   Global Step: 87400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:58:59,753-Speed 3014.18 samples/sec   Loss 8.1404   LearningRate 0.0420   Epoch: 7   Global Step: 87410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:59:03,122-Speed 3039.98 samples/sec   Loss 8.2477   LearningRate 0.0420   Epoch: 7   Global Step: 87420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:59:06,497-Speed 3034.79 samples/sec   Loss 8.4236   LearningRate 0.0420   Epoch: 7   Global Step: 87430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:59:09,926-Speed 2987.79 samples/sec   Loss 8.3309   LearningRate 0.0420   Epoch: 7   Global Step: 87440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:59:13,336-Speed 3003.45 samples/sec   Loss 8.2494   LearningRate 0.0420   Epoch: 7   Global Step: 87450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:59:16,653-Speed 3087.57 samples/sec   Loss 8.3610   LearningRate 0.0420   Epoch: 7   Global Step: 87460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:59:20,023-Speed 3039.15 samples/sec   Loss 8.2972   LearningRate 0.0420   Epoch: 7   Global Step: 87470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:59:23,399-Speed 3034.72 samples/sec   Loss 8.4125   LearningRate 0.0420   Epoch: 7   Global Step: 87480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:59:26,740-Speed 3065.66 samples/sec   Loss 8.3535   LearningRate 0.0420   Epoch: 7   Global Step: 87490   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:59:30,084-Speed 3062.69 samples/sec   Loss 8.5236   LearningRate 0.0420   Epoch: 7   Global Step: 87500   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:59:33,446-Speed 3047.01 samples/sec   Loss 8.2762   LearningRate 0.0420   Epoch: 7   Global Step: 87510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:59:36,835-Speed 3022.17 samples/sec   Loss 8.3223   LearningRate 0.0420   Epoch: 7   Global Step: 87520   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:59:40,253-Speed 2996.87 samples/sec   Loss 8.3086   LearningRate 0.0419   Epoch: 7   Global Step: 87530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:59:43,596-Speed 3064.16 samples/sec   Loss 8.3891   LearningRate 0.0419   Epoch: 7   Global Step: 87540   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:59:47,006-Speed 3003.65 samples/sec   Loss 8.3887   LearningRate 0.0419   Epoch: 7   Global Step: 87550   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 09:59:50,514-Speed 2920.35 samples/sec   Loss 8.4697   LearningRate 0.0419   Epoch: 7   Global Step: 87560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:59:53,861-Speed 3060.35 samples/sec   Loss 8.3752   LearningRate 0.0419   Epoch: 7   Global Step: 87570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 09:59:57,242-Speed 3029.50 samples/sec   Loss 8.4165   LearningRate 0.0419   Epoch: 7   Global Step: 87580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:00:00,599-Speed 3051.21 samples/sec   Loss 8.3229   LearningRate 0.0419   Epoch: 7   Global Step: 87590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:00:03,923-Speed 3082.89 samples/sec   Loss 8.3413   LearningRate 0.0419   Epoch: 7   Global Step: 87600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:00:07,274-Speed 3056.93 samples/sec   Loss 8.4121   LearningRate 0.0419   Epoch: 7   Global Step: 87610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:00:10,653-Speed 3031.95 samples/sec   Loss 8.3110   LearningRate 0.0419   Epoch: 7   Global Step: 87620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:00:14,131-Speed 2944.78 samples/sec   Loss 8.2928   LearningRate 0.0419   Epoch: 7   Global Step: 87630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:00:17,489-Speed 3050.39 samples/sec   Loss 8.3724   LearningRate 0.0419   Epoch: 7   Global Step: 87640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:00:21,007-Speed 2911.65 samples/sec   Loss 8.4822   LearningRate 0.0419   Epoch: 7   Global Step: 87650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:00:24,429-Speed 2993.42 samples/sec   Loss 8.5025   LearningRate 0.0419   Epoch: 7   Global Step: 87660   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:00:27,804-Speed 3034.87 samples/sec   Loss 8.4944   LearningRate 0.0419   Epoch: 7   Global Step: 87670   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:00:31,207-Speed 3010.32 samples/sec   Loss 8.3300   LearningRate 0.0419   Epoch: 7   Global Step: 87680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:00:34,631-Speed 2991.43 samples/sec   Loss 8.3290   LearningRate 0.0419   Epoch: 7   Global Step: 87690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:00:37,970-Speed 3067.69 samples/sec   Loss 8.3337   LearningRate 0.0419   Epoch: 7   Global Step: 87700   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:00:41,353-Speed 3028.04 samples/sec   Loss 8.3552   LearningRate 0.0419   Epoch: 7   Global Step: 87710   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:00:44,703-Speed 3058.29 samples/sec   Loss 8.4089   LearningRate 0.0418   Epoch: 7   Global Step: 87720   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:00:48,073-Speed 3038.98 samples/sec   Loss 8.4817   LearningRate 0.0418   Epoch: 7   Global Step: 87730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:00:51,442-Speed 3039.80 samples/sec   Loss 8.4017   LearningRate 0.0418   Epoch: 7   Global Step: 87740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:00:54,833-Speed 3021.03 samples/sec   Loss 8.5269   LearningRate 0.0418   Epoch: 7   Global Step: 87750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:00:58,176-Speed 3063.95 samples/sec   Loss 8.4717   LearningRate 0.0418   Epoch: 7   Global Step: 87760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:01:01,516-Speed 3066.49 samples/sec   Loss 8.4960   LearningRate 0.0418   Epoch: 7   Global Step: 87770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:01:04,915-Speed 3013.86 samples/sec   Loss 8.4833   LearningRate 0.0418   Epoch: 7   Global Step: 87780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:01:08,235-Speed 3085.28 samples/sec   Loss 8.4427   LearningRate 0.0418   Epoch: 7   Global Step: 87790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:01:11,579-Speed 3063.39 samples/sec   Loss 8.4338   LearningRate 0.0418   Epoch: 7   Global Step: 87800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:01:14,920-Speed 3065.72 samples/sec   Loss 8.3885   LearningRate 0.0418   Epoch: 7   Global Step: 87810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:01:18,314-Speed 3017.88 samples/sec   Loss 8.4656   LearningRate 0.0418   Epoch: 7   Global Step: 87820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:01:21,703-Speed 3022.20 samples/sec   Loss 8.3215   LearningRate 0.0418   Epoch: 7   Global Step: 87830   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:01:25,107-Speed 3009.00 samples/sec   Loss 8.5538   LearningRate 0.0418   Epoch: 7   Global Step: 87840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:01:28,537-Speed 2987.00 samples/sec   Loss 8.6022   LearningRate 0.0418   Epoch: 7   Global Step: 87850   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:01:31,851-Speed 3090.27 samples/sec   Loss 8.4965   LearningRate 0.0418   Epoch: 7   Global Step: 87860   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:01:35,228-Speed 3033.24 samples/sec   Loss 8.4417   LearningRate 0.0418   Epoch: 7   Global Step: 87870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:01:38,656-Speed 2987.95 samples/sec   Loss 8.4123   LearningRate 0.0418   Epoch: 7   Global Step: 87880   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:01:42,063-Speed 3006.29 samples/sec   Loss 8.4671   LearningRate 0.0418   Epoch: 7   Global Step: 87890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:01:45,431-Speed 3042.29 samples/sec   Loss 8.5051   LearningRate 0.0418   Epoch: 7   Global Step: 87900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:01:48,781-Speed 3056.82 samples/sec   Loss 8.6747   LearningRate 0.0417   Epoch: 7   Global Step: 87910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:01:52,159-Speed 3032.90 samples/sec   Loss 8.4271   LearningRate 0.0417   Epoch: 7   Global Step: 87920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:01:55,510-Speed 3055.87 samples/sec   Loss 8.5293   LearningRate 0.0417   Epoch: 7   Global Step: 87930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:01:58,859-Speed 3058.66 samples/sec   Loss 8.3778   LearningRate 0.0417   Epoch: 7   Global Step: 87940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:02,239-Speed 3031.54 samples/sec   Loss 8.5616   LearningRate 0.0417   Epoch: 7   Global Step: 87950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:05,644-Speed 3008.29 samples/sec   Loss 8.5126   LearningRate 0.0417   Epoch: 7   Global Step: 87960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:09,108-Speed 2956.78 samples/sec   Loss 8.6567   LearningRate 0.0417   Epoch: 7   Global Step: 87970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:12,497-Speed 3022.61 samples/sec   Loss 8.4714   LearningRate 0.0417   Epoch: 7   Global Step: 87980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:15,837-Speed 3066.69 samples/sec   Loss 8.6024   LearningRate 0.0417   Epoch: 7   Global Step: 87990   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:02:19,163-Speed 3079.94 samples/sec   Loss 8.5607   LearningRate 0.0417   Epoch: 7   Global Step: 88000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:22,506-Speed 3063.38 samples/sec   Loss 8.6084   LearningRate 0.0417   Epoch: 7   Global Step: 88010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:25,893-Speed 3024.57 samples/sec   Loss 8.6196   LearningRate 0.0417   Epoch: 7   Global Step: 88020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:29,227-Speed 3072.25 samples/sec   Loss 8.6666   LearningRate 0.0417   Epoch: 7   Global Step: 88030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:32,590-Speed 3045.94 samples/sec   Loss 8.6161   LearningRate 0.0417   Epoch: 7   Global Step: 88040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:35,908-Speed 3086.41 samples/sec   Loss 8.6209   LearningRate 0.0417   Epoch: 7   Global Step: 88050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:39,238-Speed 3075.71 samples/sec   Loss 8.6672   LearningRate 0.0417   Epoch: 7   Global Step: 88060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:42,596-Speed 3050.71 samples/sec   Loss 8.5922   LearningRate 0.0417   Epoch: 7   Global Step: 88070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:45,969-Speed 3036.18 samples/sec   Loss 8.4141   LearningRate 0.0417   Epoch: 7   Global Step: 88080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:49,346-Speed 3033.69 samples/sec   Loss 8.5021   LearningRate 0.0417   Epoch: 7   Global Step: 88090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:52,699-Speed 3054.26 samples/sec   Loss 8.6464   LearningRate 0.0416   Epoch: 7   Global Step: 88100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:02:56,086-Speed 3024.57 samples/sec   Loss 8.5815   LearningRate 0.0416   Epoch: 7   Global Step: 88110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:02:59,459-Speed 3037.08 samples/sec   Loss 8.6699   LearningRate 0.0416   Epoch: 7   Global Step: 88120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:03:02,882-Speed 2991.80 samples/sec   Loss 8.5568   LearningRate 0.0416   Epoch: 7   Global Step: 88130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:03:06,250-Speed 3041.68 samples/sec   Loss 8.5696   LearningRate 0.0416   Epoch: 7   Global Step: 88140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:03:09,570-Speed 3084.56 samples/sec   Loss 8.4597   LearningRate 0.0416   Epoch: 7   Global Step: 88150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:03:12,912-Speed 3065.77 samples/sec   Loss 8.6732   LearningRate 0.0416   Epoch: 7   Global Step: 88160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:03:16,247-Speed 3070.98 samples/sec   Loss 8.6871   LearningRate 0.0416   Epoch: 7   Global Step: 88170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:03:19,612-Speed 3044.42 samples/sec   Loss 8.5025   LearningRate 0.0416   Epoch: 7   Global Step: 88180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:03:22,940-Speed 3077.22 samples/sec   Loss 8.5586   LearningRate 0.0416   Epoch: 7   Global Step: 88190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:03:26,266-Speed 3080.07 samples/sec   Loss 8.6933   LearningRate 0.0416   Epoch: 7   Global Step: 88200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:03:29,608-Speed 3064.96 samples/sec   Loss 8.5506   LearningRate 0.0416   Epoch: 7   Global Step: 88210   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:03:32,954-Speed 3061.36 samples/sec   Loss 8.4229   LearningRate 0.0416   Epoch: 7   Global Step: 88220   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:03:36,412-Speed 2961.24 samples/sec   Loss 8.6029   LearningRate 0.0416   Epoch: 7   Global Step: 88230   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:03:39,810-Speed 3014.45 samples/sec   Loss 8.6661   LearningRate 0.0416   Epoch: 7   Global Step: 88240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:03:43,184-Speed 3036.68 samples/sec   Loss 8.7029   LearningRate 0.0416   Epoch: 7   Global Step: 88250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:03:46,555-Speed 3038.11 samples/sec   Loss 8.5381   LearningRate 0.0416   Epoch: 7   Global Step: 88260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:03:49,907-Speed 3056.43 samples/sec   Loss 8.6044   LearningRate 0.0416   Epoch: 7   Global Step: 88270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:03:53,299-Speed 3019.69 samples/sec   Loss 8.7334   LearningRate 0.0416   Epoch: 7   Global Step: 88280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:03:56,715-Speed 2997.89 samples/sec   Loss 8.6390   LearningRate 0.0416   Epoch: 7   Global Step: 88290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:04:00,187-Speed 2949.93 samples/sec   Loss 8.5757   LearningRate 0.0415   Epoch: 7   Global Step: 88300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:04:03,583-Speed 3016.48 samples/sec   Loss 8.6617   LearningRate 0.0415   Epoch: 7   Global Step: 88310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:04:06,912-Speed 3076.51 samples/sec   Loss 8.7498   LearningRate 0.0415   Epoch: 7   Global Step: 88320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:04:10,265-Speed 3054.85 samples/sec   Loss 8.6363   LearningRate 0.0415   Epoch: 7   Global Step: 88330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:04:13,638-Speed 3036.98 samples/sec   Loss 8.7777   LearningRate 0.0415   Epoch: 7   Global Step: 88340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:04:16,971-Speed 3072.87 samples/sec   Loss 8.6729   LearningRate 0.0415   Epoch: 7   Global Step: 88350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:04:20,308-Speed 3070.04 samples/sec   Loss 8.7891   LearningRate 0.0415   Epoch: 7   Global Step: 88360   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:04:23,722-Speed 2999.52 samples/sec   Loss 8.5898   LearningRate 0.0415   Epoch: 7   Global Step: 88370   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:04:27,095-Speed 3037.64 samples/sec   Loss 8.7945   LearningRate 0.0415   Epoch: 7   Global Step: 88380   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:04:30,410-Speed 3089.82 samples/sec   Loss 8.7316   LearningRate 0.0415   Epoch: 7   Global Step: 88390   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:04:33,913-Speed 2923.98 samples/sec   Loss 8.7206   LearningRate 0.0415   Epoch: 7   Global Step: 88400   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:04:37,326-Speed 3000.81 samples/sec   Loss 8.7759   LearningRate 0.0415   Epoch: 7   Global Step: 88410   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:04:40,759-Speed 2983.59 samples/sec   Loss 8.7385   LearningRate 0.0415   Epoch: 7   Global Step: 88420   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:04:44,267-Speed 2919.57 samples/sec   Loss 8.8082   LearningRate 0.0415   Epoch: 7   Global Step: 88430   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:04:47,713-Speed 2972.79 samples/sec   Loss 8.7018   LearningRate 0.0415   Epoch: 7   Global Step: 88440   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:04:51,137-Speed 2991.98 samples/sec   Loss 8.7600   LearningRate 0.0415   Epoch: 7   Global Step: 88450   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:04:54,452-Speed 3089.80 samples/sec   Loss 8.6831   LearningRate 0.0415   Epoch: 7   Global Step: 88460   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:04:57,818-Speed 3043.05 samples/sec   Loss 8.9472   LearningRate 0.0415   Epoch: 7   Global Step: 88470   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:01,287-Speed 2952.67 samples/sec   Loss 8.6828   LearningRate 0.0415   Epoch: 7   Global Step: 88480   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:04,712-Speed 2990.78 samples/sec   Loss 8.7276   LearningRate 0.0414   Epoch: 7   Global Step: 88490   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:08,045-Speed 3072.95 samples/sec   Loss 8.7176   LearningRate 0.0414   Epoch: 7   Global Step: 88500   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:11,409-Speed 3044.12 samples/sec   Loss 8.8172   LearningRate 0.0414   Epoch: 7   Global Step: 88510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:14,747-Speed 3068.66 samples/sec   Loss 8.7217   LearningRate 0.0414   Epoch: 7   Global Step: 88520   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:18,704-Speed 2589.02 samples/sec   Loss 8.8225   LearningRate 0.0414   Epoch: 7   Global Step: 88530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:22,031-Speed 3078.20 samples/sec   Loss 8.7922   LearningRate 0.0414   Epoch: 7   Global Step: 88540   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:25,392-Speed 3048.04 samples/sec   Loss 8.8568   LearningRate 0.0414   Epoch: 7   Global Step: 88550   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-04-27 10:05:30,148-Speed 2153.32 samples/sec   Loss 8.8414   LearningRate 0.0414   Epoch: 7   Global Step: 88560   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:33,576-Speed 2988.56 samples/sec   Loss 8.8066   LearningRate 0.0414   Epoch: 7   Global Step: 88570   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:37,615-Speed 2535.96 samples/sec   Loss 8.6957   LearningRate 0.0414   Epoch: 7   Global Step: 88580   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:40,951-Speed 3069.96 samples/sec   Loss 8.7383   LearningRate 0.0414   Epoch: 7   Global Step: 88590   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:44,294-Speed 3064.27 samples/sec   Loss 8.7657   LearningRate 0.0414   Epoch: 7   Global Step: 88600   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:47,694-Speed 3012.81 samples/sec   Loss 8.7959   LearningRate 0.0414   Epoch: 7   Global Step: 88610   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:51,093-Speed 3013.30 samples/sec   Loss 8.6624   LearningRate 0.0414   Epoch: 7   Global Step: 88620   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:54,503-Speed 3003.77 samples/sec   Loss 8.7722   LearningRate 0.0414   Epoch: 7   Global Step: 88630   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:05:57,897-Speed 3017.36 samples/sec   Loss 8.5679   LearningRate 0.0414   Epoch: 7   Global Step: 88640   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:06:01,322-Speed 2991.09 samples/sec   Loss 8.7849   LearningRate 0.0414   Epoch: 7   Global Step: 88650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:06:04,675-Speed 3054.76 samples/sec   Loss 8.8641   LearningRate 0.0414   Epoch: 7   Global Step: 88660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:06:08,037-Speed 3046.93 samples/sec   Loss 8.8363   LearningRate 0.0414   Epoch: 7   Global Step: 88670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:06:11,472-Speed 2981.71 samples/sec   Loss 8.8235   LearningRate 0.0413   Epoch: 7   Global Step: 88680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:06:14,857-Speed 3025.71 samples/sec   Loss 8.6101   LearningRate 0.0413   Epoch: 7   Global Step: 88690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:06:18,265-Speed 3005.65 samples/sec   Loss 8.9281   LearningRate 0.0413   Epoch: 7   Global Step: 88700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:06:21,687-Speed 2993.86 samples/sec   Loss 8.7739   LearningRate 0.0413   Epoch: 7   Global Step: 88710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:06:25,098-Speed 3002.77 samples/sec   Loss 8.7301   LearningRate 0.0413   Epoch: 7   Global Step: 88720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:06:28,484-Speed 3025.20 samples/sec   Loss 8.7709   LearningRate 0.0413   Epoch: 7   Global Step: 88730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:06:31,932-Speed 2970.67 samples/sec   Loss 8.8623   LearningRate 0.0413   Epoch: 7   Global Step: 88740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:06:35,396-Speed 2956.36 samples/sec   Loss 8.8185   LearningRate 0.0413   Epoch: 7   Global Step: 88750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:06:38,814-Speed 2996.80 samples/sec   Loss 8.9481   LearningRate 0.0413   Epoch: 7   Global Step: 88760   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:06:42,265-Speed 2968.38 samples/sec   Loss 8.8766   LearningRate 0.0413   Epoch: 7   Global Step: 88770   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:06:45,704-Speed 2978.77 samples/sec   Loss 8.8057   LearningRate 0.0413   Epoch: 7   Global Step: 88780   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:06:49,151-Speed 2971.23 samples/sec   Loss 8.8650   LearningRate 0.0413   Epoch: 7   Global Step: 88790   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:06:52,576-Speed 2991.25 samples/sec   Loss 8.8527   LearningRate 0.0413   Epoch: 7   Global Step: 88800   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:06:56,000-Speed 2991.05 samples/sec   Loss 8.8816   LearningRate 0.0413   Epoch: 7   Global Step: 88810   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:06:59,374-Speed 3035.75 samples/sec   Loss 8.7575   LearningRate 0.0413   Epoch: 7   Global Step: 88820   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:07:02,752-Speed 3032.20 samples/sec   Loss 8.8678   LearningRate 0.0413   Epoch: 7   Global Step: 88830   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:07:06,136-Speed 3027.03 samples/sec   Loss 8.8171   LearningRate 0.0413   Epoch: 7   Global Step: 88840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:07:09,467-Speed 3074.27 samples/sec   Loss 8.9022   LearningRate 0.0413   Epoch: 7   Global Step: 88850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:07:12,829-Speed 3047.14 samples/sec   Loss 8.8757   LearningRate 0.0413   Epoch: 7   Global Step: 88860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:07:16,192-Speed 3045.73 samples/sec   Loss 8.9141   LearningRate 0.0412   Epoch: 7   Global Step: 88870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:07:19,548-Speed 3052.05 samples/sec   Loss 8.8886   LearningRate 0.0412   Epoch: 7   Global Step: 88880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:07:22,878-Speed 3076.28 samples/sec   Loss 8.7891   LearningRate 0.0412   Epoch: 7   Global Step: 88890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:07:26,192-Speed 3090.43 samples/sec   Loss 8.9329   LearningRate 0.0412   Epoch: 7   Global Step: 88900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:07:29,531-Speed 3068.36 samples/sec   Loss 8.8077   LearningRate 0.0412   Epoch: 7   Global Step: 88910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:07:33,490-Speed 2586.98 samples/sec   Loss 9.0181   LearningRate 0.0412   Epoch: 7   Global Step: 88920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:07:36,893-Speed 3009.73 samples/sec   Loss 8.8072   LearningRate 0.0412   Epoch: 7   Global Step: 88930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:07:40,243-Speed 3057.15 samples/sec   Loss 8.9655   LearningRate 0.0412   Epoch: 7   Global Step: 88940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:07:43,635-Speed 3020.77 samples/sec   Loss 8.8605   LearningRate 0.0412   Epoch: 7   Global Step: 88950   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:07:47,011-Speed 3033.57 samples/sec   Loss 8.9091   LearningRate 0.0412   Epoch: 7   Global Step: 88960   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:07:50,487-Speed 2947.91 samples/sec   Loss 8.8177   LearningRate 0.0412   Epoch: 7   Global Step: 88970   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:07:53,948-Speed 2960.03 samples/sec   Loss 8.9323   LearningRate 0.0412   Epoch: 7   Global Step: 88980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:07:57,354-Speed 3007.52 samples/sec   Loss 8.9706   LearningRate 0.0412   Epoch: 7   Global Step: 88990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:08:00,742-Speed 3023.37 samples/sec   Loss 8.8267   LearningRate 0.0412   Epoch: 7   Global Step: 89000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:08:04,259-Speed 2912.85 samples/sec   Loss 8.8299   LearningRate 0.0412   Epoch: 7   Global Step: 89010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:08:07,667-Speed 3005.70 samples/sec   Loss 8.8584   LearningRate 0.0412   Epoch: 7   Global Step: 89020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:08:11,110-Speed 2975.00 samples/sec   Loss 8.9487   LearningRate 0.0412   Epoch: 7   Global Step: 89030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:08:14,509-Speed 3013.03 samples/sec   Loss 8.9425   LearningRate 0.0412   Epoch: 7   Global Step: 89040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:08:17,894-Speed 3026.18 samples/sec   Loss 8.9646   LearningRate 0.0412   Epoch: 7   Global Step: 89050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:08:21,280-Speed 3025.39 samples/sec   Loss 8.8778   LearningRate 0.0412   Epoch: 7   Global Step: 89060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:08:24,722-Speed 2975.40 samples/sec   Loss 8.9107   LearningRate 0.0411   Epoch: 7   Global Step: 89070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:08:28,178-Speed 2964.25 samples/sec   Loss 8.7020   LearningRate 0.0411   Epoch: 7   Global Step: 89080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:08:31,566-Speed 3022.82 samples/sec   Loss 9.0227   LearningRate 0.0411   Epoch: 7   Global Step: 89090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:08:34,964-Speed 3014.53 samples/sec   Loss 8.9990   LearningRate 0.0411   Epoch: 7   Global Step: 89100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:08:38,348-Speed 3026.88 samples/sec   Loss 8.9012   LearningRate 0.0411   Epoch: 7   Global Step: 89110   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:08:41,767-Speed 2995.75 samples/sec   Loss 8.8603   LearningRate 0.0411   Epoch: 7   Global Step: 89120   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:08:45,159-Speed 3020.51 samples/sec   Loss 9.0431   LearningRate 0.0411   Epoch: 7   Global Step: 89130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:08:48,546-Speed 3023.82 samples/sec   Loss 8.7395   LearningRate 0.0411   Epoch: 7   Global Step: 89140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:08:51,889-Speed 3063.74 samples/sec   Loss 8.8137   LearningRate 0.0411   Epoch: 7   Global Step: 89150   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:08:55,259-Speed 3039.91 samples/sec   Loss 8.9324   LearningRate 0.0411   Epoch: 7   Global Step: 89160   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:08:58,687-Speed 2987.34 samples/sec   Loss 8.9750   LearningRate 0.0411   Epoch: 7   Global Step: 89170   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:09:02,060-Speed 3037.58 samples/sec   Loss 8.8807   LearningRate 0.0411   Epoch: 7   Global Step: 89180   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:09:05,444-Speed 3026.39 samples/sec   Loss 8.9896   LearningRate 0.0411   Epoch: 7   Global Step: 89190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:09:08,898-Speed 2965.50 samples/sec   Loss 8.9443   LearningRate 0.0411   Epoch: 7   Global Step: 89200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:09:12,269-Speed 3038.33 samples/sec   Loss 8.8305   LearningRate 0.0411   Epoch: 7   Global Step: 89210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:09:15,643-Speed 3036.31 samples/sec   Loss 8.9914   LearningRate 0.0411   Epoch: 7   Global Step: 89220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:09:19,044-Speed 3010.95 samples/sec   Loss 9.0833   LearningRate 0.0411   Epoch: 7   Global Step: 89230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:09:22,541-Speed 2929.50 samples/sec   Loss 8.9433   LearningRate 0.0411   Epoch: 7   Global Step: 89240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:09:25,924-Speed 3027.54 samples/sec   Loss 8.9684   LearningRate 0.0411   Epoch: 7   Global Step: 89250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:09:29,298-Speed 3036.00 samples/sec   Loss 8.8717   LearningRate 0.0410   Epoch: 7   Global Step: 89260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:09:32,664-Speed 3042.86 samples/sec   Loss 8.9998   LearningRate 0.0410   Epoch: 7   Global Step: 89270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:09:36,079-Speed 2999.68 samples/sec   Loss 8.9021   LearningRate 0.0410   Epoch: 7   Global Step: 89280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:09:39,543-Speed 2956.54 samples/sec   Loss 8.9251   LearningRate 0.0410   Epoch: 7   Global Step: 89290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:09:42,884-Speed 3065.97 samples/sec   Loss 8.8692   LearningRate 0.0410   Epoch: 7   Global Step: 89300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:09:46,318-Speed 2983.35 samples/sec   Loss 8.9551   LearningRate 0.0410   Epoch: 7   Global Step: 89310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:09:49,727-Speed 3004.02 samples/sec   Loss 8.9684   LearningRate 0.0410   Epoch: 7   Global Step: 89320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:09:53,168-Speed 2976.91 samples/sec   Loss 8.9648   LearningRate 0.0410   Epoch: 7   Global Step: 89330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:09:56,510-Speed 3064.74 samples/sec   Loss 9.0265   LearningRate 0.0410   Epoch: 7   Global Step: 89340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:10:00,028-Speed 2911.74 samples/sec   Loss 9.0529   LearningRate 0.0410   Epoch: 7   Global Step: 89350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:10:03,515-Speed 2937.67 samples/sec   Loss 9.0528   LearningRate 0.0410   Epoch: 7   Global Step: 89360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:10:06,948-Speed 2983.46 samples/sec   Loss 8.9408   LearningRate 0.0410   Epoch: 7   Global Step: 89370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:10:10,401-Speed 2966.83 samples/sec   Loss 9.0694   LearningRate 0.0410   Epoch: 7   Global Step: 89380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:10:13,793-Speed 3019.50 samples/sec   Loss 8.9792   LearningRate 0.0410   Epoch: 7   Global Step: 89390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:10:17,133-Speed 3067.07 samples/sec   Loss 8.9914   LearningRate 0.0410   Epoch: 7   Global Step: 89400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:10:20,560-Speed 2989.10 samples/sec   Loss 9.0491   LearningRate 0.0410   Epoch: 7   Global Step: 89410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:10:23,988-Speed 2987.86 samples/sec   Loss 9.0314   LearningRate 0.0410   Epoch: 7   Global Step: 89420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:10:27,420-Speed 2984.70 samples/sec   Loss 8.8898   LearningRate 0.0410   Epoch: 7   Global Step: 89430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:10:30,922-Speed 2924.35 samples/sec   Loss 9.1615   LearningRate 0.0410   Epoch: 7   Global Step: 89440   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:10:34,267-Speed 3062.01 samples/sec   Loss 9.0874   LearningRate 0.0410   Epoch: 7   Global Step: 89450   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:10:37,672-Speed 3008.54 samples/sec   Loss 8.9138   LearningRate 0.0409   Epoch: 7   Global Step: 89460   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:10:41,035-Speed 3046.06 samples/sec   Loss 9.0770   LearningRate 0.0409   Epoch: 7   Global Step: 89470   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:10:44,356-Speed 3083.58 samples/sec   Loss 9.0917   LearningRate 0.0409   Epoch: 7   Global Step: 89480   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:10:47,730-Speed 3036.62 samples/sec   Loss 9.1051   LearningRate 0.0409   Epoch: 7   Global Step: 89490   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:10:51,038-Speed 3095.42 samples/sec   Loss 8.9367   LearningRate 0.0409   Epoch: 7   Global Step: 89500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:10:54,471-Speed 2984.64 samples/sec   Loss 9.0223   LearningRate 0.0409   Epoch: 7   Global Step: 89510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:10:57,839-Speed 3040.91 samples/sec   Loss 8.9664   LearningRate 0.0409   Epoch: 7   Global Step: 89520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:11:01,296-Speed 2963.35 samples/sec   Loss 9.1226   LearningRate 0.0409   Epoch: 7   Global Step: 89530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:11:04,673-Speed 3033.02 samples/sec   Loss 8.8551   LearningRate 0.0409   Epoch: 7   Global Step: 89540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:11:08,130-Speed 2962.23 samples/sec   Loss 9.1116   LearningRate 0.0409   Epoch: 7   Global Step: 89550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:11:11,493-Speed 3046.06 samples/sec   Loss 8.9471   LearningRate 0.0409   Epoch: 7   Global Step: 89560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:11:14,823-Speed 3076.35 samples/sec   Loss 8.9608   LearningRate 0.0409   Epoch: 7   Global Step: 89570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:11:18,152-Speed 3076.38 samples/sec   Loss 8.9719   LearningRate 0.0409   Epoch: 7   Global Step: 89580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:11:21,570-Speed 2996.90 samples/sec   Loss 9.0467   LearningRate 0.0409   Epoch: 7   Global Step: 89590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:11:24,947-Speed 3033.13 samples/sec   Loss 9.0241   LearningRate 0.0409   Epoch: 7   Global Step: 89600   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:11:28,354-Speed 3006.83 samples/sec   Loss 9.0247   LearningRate 0.0409   Epoch: 7   Global Step: 89610   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:11:31,739-Speed 3025.95 samples/sec   Loss 8.9936   LearningRate 0.0409   Epoch: 7   Global Step: 89620   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:11:35,076-Speed 3069.11 samples/sec   Loss 8.9659   LearningRate 0.0409   Epoch: 7   Global Step: 89630   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:11:38,428-Speed 3055.80 samples/sec   Loss 8.9770   LearningRate 0.0409   Epoch: 7   Global Step: 89640   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:11:41,795-Speed 3042.54 samples/sec   Loss 8.9778   LearningRate 0.0408   Epoch: 7   Global Step: 89650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:11:45,143-Speed 3059.11 samples/sec   Loss 9.1349   LearningRate 0.0408   Epoch: 7   Global Step: 89660   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:11:48,465-Speed 3083.34 samples/sec   Loss 8.9212   LearningRate 0.0408   Epoch: 7   Global Step: 89670   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:11:51,935-Speed 2952.15 samples/sec   Loss 9.0575   LearningRate 0.0408   Epoch: 7   Global Step: 89680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:11:55,337-Speed 3010.94 samples/sec   Loss 9.0360   LearningRate 0.0408   Epoch: 7   Global Step: 89690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:11:58,688-Speed 3056.90 samples/sec   Loss 9.0753   LearningRate 0.0408   Epoch: 7   Global Step: 89700   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:12:02,076-Speed 3022.38 samples/sec   Loss 8.9904   LearningRate 0.0408   Epoch: 7   Global Step: 89710   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:12:05,449-Speed 3037.28 samples/sec   Loss 9.1048   LearningRate 0.0408   Epoch: 7   Global Step: 89720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:08,801-Speed 3056.17 samples/sec   Loss 8.9242   LearningRate 0.0408   Epoch: 7   Global Step: 89730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:12,173-Speed 3036.70 samples/sec   Loss 9.1564   LearningRate 0.0408   Epoch: 7   Global Step: 89740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:15,511-Speed 3069.50 samples/sec   Loss 8.9922   LearningRate 0.0408   Epoch: 7   Global Step: 89750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:18,861-Speed 3057.06 samples/sec   Loss 9.1362   LearningRate 0.0408   Epoch: 7   Global Step: 89760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:22,216-Speed 3052.58 samples/sec   Loss 9.0106   LearningRate 0.0408   Epoch: 7   Global Step: 89770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:25,624-Speed 3006.07 samples/sec   Loss 9.1287   LearningRate 0.0408   Epoch: 7   Global Step: 89780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:28,969-Speed 3063.11 samples/sec   Loss 9.0806   LearningRate 0.0408   Epoch: 7   Global Step: 89790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:32,438-Speed 2952.48 samples/sec   Loss 9.1127   LearningRate 0.0408   Epoch: 7   Global Step: 89800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:35,811-Speed 3036.88 samples/sec   Loss 8.9792   LearningRate 0.0408   Epoch: 7   Global Step: 89810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:39,189-Speed 3032.56 samples/sec   Loss 8.9939   LearningRate 0.0408   Epoch: 7   Global Step: 89820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:42,524-Speed 3071.23 samples/sec   Loss 8.9222   LearningRate 0.0408   Epoch: 7   Global Step: 89830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:45,933-Speed 3005.92 samples/sec   Loss 9.0604   LearningRate 0.0407   Epoch: 7   Global Step: 89840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:49,244-Speed 3093.02 samples/sec   Loss 9.1680   LearningRate 0.0407   Epoch: 7   Global Step: 89850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:52,641-Speed 3015.83 samples/sec   Loss 8.9649   LearningRate 0.0407   Epoch: 7   Global Step: 89860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:55,999-Speed 3049.81 samples/sec   Loss 9.1070   LearningRate 0.0407   Epoch: 7   Global Step: 89870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:12:59,363-Speed 3045.14 samples/sec   Loss 9.0568   LearningRate 0.0407   Epoch: 7   Global Step: 89880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:13:02,688-Speed 3080.60 samples/sec   Loss 9.0268   LearningRate 0.0407   Epoch: 7   Global Step: 89890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:13:06,051-Speed 3045.84 samples/sec   Loss 9.0046   LearningRate 0.0407   Epoch: 7   Global Step: 89900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:13:09,478-Speed 2989.01 samples/sec   Loss 8.9566   LearningRate 0.0407   Epoch: 7   Global Step: 89910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:13:12,807-Speed 3076.97 samples/sec   Loss 9.0083   LearningRate 0.0407   Epoch: 7   Global Step: 89920   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:13:16,177-Speed 3039.23 samples/sec   Loss 9.1000   LearningRate 0.0407   Epoch: 7   Global Step: 89930   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:13:19,486-Speed 3096.67 samples/sec   Loss 9.1193   LearningRate 0.0407   Epoch: 7   Global Step: 89940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:13:22,932-Speed 2973.01 samples/sec   Loss 9.1041   LearningRate 0.0407   Epoch: 7   Global Step: 89950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:13:26,327-Speed 3016.20 samples/sec   Loss 9.0322   LearningRate 0.0407   Epoch: 7   Global Step: 89960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:13:29,765-Speed 2980.47 samples/sec   Loss 9.0663   LearningRate 0.0407   Epoch: 7   Global Step: 89970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:13:33,230-Speed 2956.43 samples/sec   Loss 8.9947   LearningRate 0.0407   Epoch: 7   Global Step: 89980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:13:36,610-Speed 3030.34 samples/sec   Loss 9.1804   LearningRate 0.0407   Epoch: 7   Global Step: 89990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:13:40,007-Speed 3015.22 samples/sec   Loss 9.0825   LearningRate 0.0407   Epoch: 7   Global Step: 90000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:13:43,448-Speed 2976.80 samples/sec   Loss 9.0072   LearningRate 0.0407   Epoch: 7   Global Step: 90010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:13:46,930-Speed 2941.42 samples/sec   Loss 9.1141   LearningRate 0.0407   Epoch: 7   Global Step: 90020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:13:50,390-Speed 2961.01 samples/sec   Loss 9.0458   LearningRate 0.0407   Epoch: 7   Global Step: 90030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:13:53,908-Speed 2911.27 samples/sec   Loss 9.2579   LearningRate 0.0406   Epoch: 7   Global Step: 90040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:13:57,348-Speed 2977.23 samples/sec   Loss 9.1649   LearningRate 0.0406   Epoch: 7   Global Step: 90050   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:14:00,710-Speed 3047.28 samples/sec   Loss 9.0170   LearningRate 0.0406   Epoch: 7   Global Step: 90060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:14:04,150-Speed 2977.46 samples/sec   Loss 9.0073   LearningRate 0.0406   Epoch: 7   Global Step: 90070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:14:07,574-Speed 2991.21 samples/sec   Loss 9.1533   LearningRate 0.0406   Epoch: 7   Global Step: 90080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:14:10,950-Speed 3034.03 samples/sec   Loss 9.1766   LearningRate 0.0406   Epoch: 7   Global Step: 90090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:14:14,290-Speed 3067.00 samples/sec   Loss 9.1252   LearningRate 0.0406   Epoch: 7   Global Step: 90100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:14:17,653-Speed 3046.37 samples/sec   Loss 9.1727   LearningRate 0.0406   Epoch: 7   Global Step: 90110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:14:21,061-Speed 3005.27 samples/sec   Loss 9.0984   LearningRate 0.0406   Epoch: 7   Global Step: 90120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:14:24,478-Speed 2997.16 samples/sec   Loss 9.0067   LearningRate 0.0406   Epoch: 7   Global Step: 90130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:14:27,861-Speed 3027.58 samples/sec   Loss 9.0024   LearningRate 0.0406   Epoch: 7   Global Step: 90140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:14:31,250-Speed 3022.91 samples/sec   Loss 9.0591   LearningRate 0.0406   Epoch: 7   Global Step: 90150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:14:34,614-Speed 3044.66 samples/sec   Loss 9.1600   LearningRate 0.0406   Epoch: 7   Global Step: 90160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:14:38,027-Speed 3001.67 samples/sec   Loss 8.9467   LearningRate 0.0406   Epoch: 7   Global Step: 90170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:14:41,476-Speed 2969.86 samples/sec   Loss 9.0760   LearningRate 0.0406   Epoch: 7   Global Step: 90180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:14:44,891-Speed 2999.15 samples/sec   Loss 9.0674   LearningRate 0.0406   Epoch: 7   Global Step: 90190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:14:48,383-Speed 2933.81 samples/sec   Loss 9.1916   LearningRate 0.0406   Epoch: 7   Global Step: 90200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:14:51,805-Speed 2992.50 samples/sec   Loss 9.1781   LearningRate 0.0406   Epoch: 7   Global Step: 90210   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:14:55,181-Speed 3034.02 samples/sec   Loss 9.1218   LearningRate 0.0406   Epoch: 7   Global Step: 90220   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:14:58,537-Speed 3052.88 samples/sec   Loss 8.9273   LearningRate 0.0405   Epoch: 7   Global Step: 90230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:15:01,973-Speed 2980.46 samples/sec   Loss 9.1006   LearningRate 0.0405   Epoch: 7   Global Step: 90240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:15:05,409-Speed 2985.09 samples/sec   Loss 9.1067   LearningRate 0.0405   Epoch: 7   Global Step: 90250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:15:08,828-Speed 2995.50 samples/sec   Loss 9.2770   LearningRate 0.0405   Epoch: 7   Global Step: 90260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:15:12,217-Speed 3022.78 samples/sec   Loss 9.0360   LearningRate 0.0405   Epoch: 7   Global Step: 90270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:15:15,574-Speed 3050.94 samples/sec   Loss 9.0142   LearningRate 0.0405   Epoch: 7   Global Step: 90280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:15:18,940-Speed 3043.66 samples/sec   Loss 9.0540   LearningRate 0.0405   Epoch: 7   Global Step: 90290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:15:22,265-Speed 3079.78 samples/sec   Loss 9.1429   LearningRate 0.0405   Epoch: 7   Global Step: 90300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:15:25,591-Speed 3079.57 samples/sec   Loss 9.1614   LearningRate 0.0405   Epoch: 7   Global Step: 90310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:15:28,940-Speed 3058.91 samples/sec   Loss 9.0701   LearningRate 0.0405   Epoch: 7   Global Step: 90320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:15:32,315-Speed 3034.64 samples/sec   Loss 9.0903   LearningRate 0.0405   Epoch: 7   Global Step: 90330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:15:35,799-Speed 2940.31 samples/sec   Loss 9.1612   LearningRate 0.0405   Epoch: 7   Global Step: 90340   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:15:39,223-Speed 2992.07 samples/sec   Loss 9.0880   LearningRate 0.0405   Epoch: 7   Global Step: 90350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:15:42,671-Speed 2970.50 samples/sec   Loss 9.1082   LearningRate 0.0405   Epoch: 7   Global Step: 90360   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:15:46,052-Speed 3029.97 samples/sec   Loss 9.0228   LearningRate 0.0405   Epoch: 7   Global Step: 90370   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:15:49,406-Speed 3053.49 samples/sec   Loss 9.1425   LearningRate 0.0405   Epoch: 7   Global Step: 90380   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:15:52,859-Speed 2966.63 samples/sec   Loss 9.0460   LearningRate 0.0405   Epoch: 7   Global Step: 90390   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:15:56,222-Speed 3045.92 samples/sec   Loss 9.0226   LearningRate 0.0405   Epoch: 7   Global Step: 90400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:15:59,682-Speed 2959.79 samples/sec   Loss 9.0218   LearningRate 0.0405   Epoch: 7   Global Step: 90410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:16:03,071-Speed 3022.55 samples/sec   Loss 9.1065   LearningRate 0.0405   Epoch: 7   Global Step: 90420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:16:06,452-Speed 3030.11 samples/sec   Loss 9.0658   LearningRate 0.0404   Epoch: 7   Global Step: 90430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:16:09,823-Speed 3038.60 samples/sec   Loss 9.2124   LearningRate 0.0404   Epoch: 7   Global Step: 90440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:16:13,221-Speed 3014.43 samples/sec   Loss 9.0756   LearningRate 0.0404   Epoch: 7   Global Step: 90450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:16:16,617-Speed 3015.58 samples/sec   Loss 9.1882   LearningRate 0.0404   Epoch: 7   Global Step: 90460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:16:20,048-Speed 2985.45 samples/sec   Loss 9.1134   LearningRate 0.0404   Epoch: 7   Global Step: 90470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:16:23,473-Speed 2990.63 samples/sec   Loss 8.9329   LearningRate 0.0404   Epoch: 7   Global Step: 90480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:16:26,922-Speed 2969.90 samples/sec   Loss 9.0976   LearningRate 0.0404   Epoch: 7   Global Step: 90490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:16:30,419-Speed 2929.70 samples/sec   Loss 9.0460   LearningRate 0.0404   Epoch: 7   Global Step: 90500   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:16:33,822-Speed 3009.00 samples/sec   Loss 9.0012   LearningRate 0.0404   Epoch: 7   Global Step: 90510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:16:37,202-Speed 3030.69 samples/sec   Loss 9.0834   LearningRate 0.0404   Epoch: 7   Global Step: 90520   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:16:40,578-Speed 3034.91 samples/sec   Loss 9.0174   LearningRate 0.0404   Epoch: 7   Global Step: 90530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:16:43,945-Speed 3042.30 samples/sec   Loss 9.1547   LearningRate 0.0404   Epoch: 7   Global Step: 90540   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:16:47,318-Speed 3036.41 samples/sec   Loss 9.1736   LearningRate 0.0404   Epoch: 7   Global Step: 90550   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:16:50,776-Speed 2962.43 samples/sec   Loss 9.2045   LearningRate 0.0404   Epoch: 7   Global Step: 90560   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:16:54,119-Speed 3063.64 samples/sec   Loss 9.0159   LearningRate 0.0404   Epoch: 7   Global Step: 90570   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:16:57,458-Speed 3067.58 samples/sec   Loss 9.3973   LearningRate 0.0404   Epoch: 7   Global Step: 90580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:17:00,776-Speed 3086.88 samples/sec   Loss 9.0767   LearningRate 0.0404   Epoch: 7   Global Step: 90590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:17:04,114-Speed 3068.80 samples/sec   Loss 9.1124   LearningRate 0.0404   Epoch: 7   Global Step: 90600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:17:07,426-Speed 3092.96 samples/sec   Loss 9.2056   LearningRate 0.0404   Epoch: 7   Global Step: 90610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:17:10,769-Speed 3064.22 samples/sec   Loss 9.1909   LearningRate 0.0403   Epoch: 7   Global Step: 90620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:17:14,150-Speed 3029.30 samples/sec   Loss 9.0000   LearningRate 0.0403   Epoch: 7   Global Step: 90630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:17:17,485-Speed 3071.38 samples/sec   Loss 9.1793   LearningRate 0.0403   Epoch: 7   Global Step: 90640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:17:20,870-Speed 3025.58 samples/sec   Loss 9.0551   LearningRate 0.0403   Epoch: 7   Global Step: 90650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:17:24,278-Speed 3005.41 samples/sec   Loss 9.0951   LearningRate 0.0403   Epoch: 7   Global Step: 90660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:17:27,717-Speed 2978.59 samples/sec   Loss 9.0984   LearningRate 0.0403   Epoch: 7   Global Step: 90670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:17:31,152-Speed 2982.25 samples/sec   Loss 9.1860   LearningRate 0.0403   Epoch: 7   Global Step: 90680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:17:34,502-Speed 3057.55 samples/sec   Loss 9.1463   LearningRate 0.0403   Epoch: 7   Global Step: 90690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:17:37,858-Speed 3051.66 samples/sec   Loss 9.3023   LearningRate 0.0403   Epoch: 7   Global Step: 90700   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:17:41,232-Speed 3036.69 samples/sec   Loss 9.1949   LearningRate 0.0403   Epoch: 7   Global Step: 90710   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:17:44,583-Speed 3055.88 samples/sec   Loss 9.1094   LearningRate 0.0403   Epoch: 7   Global Step: 90720   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:17:48,058-Speed 2948.03 samples/sec   Loss 9.2244   LearningRate 0.0403   Epoch: 7   Global Step: 90730   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:17:51,461-Speed 3009.93 samples/sec   Loss 9.1936   LearningRate 0.0403   Epoch: 7   Global Step: 90740   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:17:54,877-Speed 2998.59 samples/sec   Loss 9.3003   LearningRate 0.0403   Epoch: 7   Global Step: 90750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:17:58,362-Speed 2939.21 samples/sec   Loss 9.1378   LearningRate 0.0403   Epoch: 7   Global Step: 90760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:01,793-Speed 2984.84 samples/sec   Loss 8.9902   LearningRate 0.0403   Epoch: 7   Global Step: 90770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:05,140-Speed 3060.26 samples/sec   Loss 9.1417   LearningRate 0.0403   Epoch: 7   Global Step: 90780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:08,562-Speed 2993.43 samples/sec   Loss 9.2030   LearningRate 0.0403   Epoch: 7   Global Step: 90790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:11,971-Speed 3004.54 samples/sec   Loss 9.1216   LearningRate 0.0403   Epoch: 7   Global Step: 90800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:15,388-Speed 2998.35 samples/sec   Loss 9.1754   LearningRate 0.0403   Epoch: 7   Global Step: 90810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:18,810-Speed 2993.36 samples/sec   Loss 9.1331   LearningRate 0.0402   Epoch: 7   Global Step: 90820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:22,225-Speed 2998.72 samples/sec   Loss 8.9983   LearningRate 0.0402   Epoch: 7   Global Step: 90830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:25,630-Speed 3008.62 samples/sec   Loss 9.2045   LearningRate 0.0402   Epoch: 7   Global Step: 90840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:29,039-Speed 3005.09 samples/sec   Loss 9.1268   LearningRate 0.0402   Epoch: 7   Global Step: 90850   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:18:32,459-Speed 2994.81 samples/sec   Loss 9.1795   LearningRate 0.0402   Epoch: 7   Global Step: 90860   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:18:35,843-Speed 3026.65 samples/sec   Loss 9.1537   LearningRate 0.0402   Epoch: 7   Global Step: 90870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:18:39,275-Speed 2984.57 samples/sec   Loss 9.1172   LearningRate 0.0402   Epoch: 7   Global Step: 90880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:42,731-Speed 2963.60 samples/sec   Loss 9.1037   LearningRate 0.0402   Epoch: 7   Global Step: 90890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:46,152-Speed 2994.50 samples/sec   Loss 9.1355   LearningRate 0.0402   Epoch: 7   Global Step: 90900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:49,562-Speed 3003.71 samples/sec   Loss 9.1430   LearningRate 0.0402   Epoch: 7   Global Step: 90910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:52,911-Speed 3058.78 samples/sec   Loss 9.1474   LearningRate 0.0402   Epoch: 7   Global Step: 90920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:56,262-Speed 3056.29 samples/sec   Loss 9.0480   LearningRate 0.0402   Epoch: 7   Global Step: 90930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:18:59,815-Speed 2883.39 samples/sec   Loss 9.2274   LearningRate 0.0402   Epoch: 7   Global Step: 90940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:19:03,192-Speed 3032.65 samples/sec   Loss 9.0960   LearningRate 0.0402   Epoch: 7   Global Step: 90950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:19:06,527-Speed 3071.28 samples/sec   Loss 9.1639   LearningRate 0.0402   Epoch: 7   Global Step: 90960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:19:09,865-Speed 3068.98 samples/sec   Loss 9.2096   LearningRate 0.0402   Epoch: 7   Global Step: 90970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:19:13,229-Speed 3044.69 samples/sec   Loss 9.1024   LearningRate 0.0402   Epoch: 7   Global Step: 90980   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:19:16,632-Speed 3010.59 samples/sec   Loss 9.1914   LearningRate 0.0402   Epoch: 7   Global Step: 90990   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:19:19,977-Speed 3061.52 samples/sec   Loss 9.2601   LearningRate 0.0402   Epoch: 7   Global Step: 91000   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:19:23,389-Speed 3002.63 samples/sec   Loss 8.9977   LearningRate 0.0402   Epoch: 7   Global Step: 91010   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:19:26,780-Speed 3020.75 samples/sec   Loss 9.1218   LearningRate 0.0401   Epoch: 7   Global Step: 91020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:19:30,238-Speed 2962.30 samples/sec   Loss 9.1553   LearningRate 0.0401   Epoch: 7   Global Step: 91030   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:19:33,722-Speed 2939.32 samples/sec   Loss 9.2405   LearningRate 0.0401   Epoch: 7   Global Step: 91040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:19:37,205-Speed 2940.62 samples/sec   Loss 9.1610   LearningRate 0.0401   Epoch: 7   Global Step: 91050   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:19:40,684-Speed 2944.78 samples/sec   Loss 9.1366   LearningRate 0.0401   Epoch: 7   Global Step: 91060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:19:44,163-Speed 2944.19 samples/sec   Loss 9.2828   LearningRate 0.0401   Epoch: 7   Global Step: 91070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:19:47,552-Speed 3022.11 samples/sec   Loss 9.1007   LearningRate 0.0401   Epoch: 7   Global Step: 91080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:19:51,058-Speed 2921.74 samples/sec   Loss 9.1104   LearningRate 0.0401   Epoch: 7   Global Step: 91090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:19:54,418-Speed 3048.71 samples/sec   Loss 9.0853   LearningRate 0.0401   Epoch: 7   Global Step: 91100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:19:57,784-Speed 3043.35 samples/sec   Loss 9.1492   LearningRate 0.0401   Epoch: 7   Global Step: 91110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:01,291-Speed 2920.44 samples/sec   Loss 9.0812   LearningRate 0.0401   Epoch: 7   Global Step: 91120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:04,647-Speed 3052.68 samples/sec   Loss 9.0170   LearningRate 0.0401   Epoch: 7   Global Step: 91130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:08,119-Speed 2950.27 samples/sec   Loss 9.0878   LearningRate 0.0401   Epoch: 7   Global Step: 91140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:11,506-Speed 3024.63 samples/sec   Loss 9.1964   LearningRate 0.0401   Epoch: 7   Global Step: 91150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:14,833-Speed 3078.99 samples/sec   Loss 9.1612   LearningRate 0.0401   Epoch: 7   Global Step: 91160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:18,156-Speed 3081.98 samples/sec   Loss 9.0271   LearningRate 0.0401   Epoch: 7   Global Step: 91170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:21,506-Speed 3058.58 samples/sec   Loss 9.1139   LearningRate 0.0401   Epoch: 7   Global Step: 91180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:24,914-Speed 3004.88 samples/sec   Loss 9.1061   LearningRate 0.0401   Epoch: 7   Global Step: 91190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:28,269-Speed 3053.41 samples/sec   Loss 9.1235   LearningRate 0.0401   Epoch: 7   Global Step: 91200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:31,660-Speed 3020.00 samples/sec   Loss 9.0985   LearningRate 0.0400   Epoch: 7   Global Step: 91210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:35,118-Speed 2963.89 samples/sec   Loss 9.1875   LearningRate 0.0400   Epoch: 7   Global Step: 91220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:38,537-Speed 2996.49 samples/sec   Loss 9.1969   LearningRate 0.0400   Epoch: 7   Global Step: 91230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:41,869-Speed 3073.77 samples/sec   Loss 9.0515   LearningRate 0.0400   Epoch: 7   Global Step: 91240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:45,249-Speed 3030.92 samples/sec   Loss 9.0517   LearningRate 0.0400   Epoch: 7   Global Step: 91250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:48,587-Speed 3068.54 samples/sec   Loss 9.2547   LearningRate 0.0400   Epoch: 7   Global Step: 91260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:51,980-Speed 3019.08 samples/sec   Loss 9.0727   LearningRate 0.0400   Epoch: 7   Global Step: 91270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:55,338-Speed 3049.65 samples/sec   Loss 9.1181   LearningRate 0.0400   Epoch: 7   Global Step: 91280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:20:58,758-Speed 2995.35 samples/sec   Loss 9.2304   LearningRate 0.0400   Epoch: 7   Global Step: 91290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:21:02,111-Speed 3054.47 samples/sec   Loss 9.1333   LearningRate 0.0400   Epoch: 7   Global Step: 91300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:21:05,487-Speed 3034.92 samples/sec   Loss 9.1819   LearningRate 0.0400   Epoch: 7   Global Step: 91310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:21:08,916-Speed 2987.29 samples/sec   Loss 9.1355   LearningRate 0.0400   Epoch: 7   Global Step: 91320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:21:12,295-Speed 3031.33 samples/sec   Loss 9.2804   LearningRate 0.0400   Epoch: 7   Global Step: 91330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:21:15,653-Speed 3051.37 samples/sec   Loss 9.0987   LearningRate 0.0400   Epoch: 7   Global Step: 91340   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:21:19,036-Speed 3027.15 samples/sec   Loss 9.2681   LearningRate 0.0400   Epoch: 7   Global Step: 91350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:21:22,404-Speed 3041.98 samples/sec   Loss 9.1404   LearningRate 0.0400   Epoch: 7   Global Step: 91360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:21:25,733-Speed 3076.58 samples/sec   Loss 9.2019   LearningRate 0.0400   Epoch: 7   Global Step: 91370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:21:29,115-Speed 3028.69 samples/sec   Loss 8.9826   LearningRate 0.0400   Epoch: 7   Global Step: 91380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:21:32,526-Speed 3003.18 samples/sec   Loss 9.1828   LearningRate 0.0400   Epoch: 7   Global Step: 91390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:21:35,924-Speed 3014.39 samples/sec   Loss 9.1357   LearningRate 0.0400   Epoch: 7   Global Step: 91400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:21:39,323-Speed 3012.68 samples/sec   Loss 9.2415   LearningRate 0.0399   Epoch: 7   Global Step: 91410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:21:42,734-Speed 3003.05 samples/sec   Loss 9.0227   LearningRate 0.0399   Epoch: 7   Global Step: 91420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:21:46,155-Speed 2994.40 samples/sec   Loss 9.1433   LearningRate 0.0399   Epoch: 7   Global Step: 91430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:21:49,627-Speed 2950.43 samples/sec   Loss 9.1675   LearningRate 0.0399   Epoch: 7   Global Step: 91440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:21:53,000-Speed 3036.24 samples/sec   Loss 9.1369   LearningRate 0.0399   Epoch: 7   Global Step: 91450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:21:56,415-Speed 2999.61 samples/sec   Loss 9.1907   LearningRate 0.0399   Epoch: 7   Global Step: 91460   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:21:59,786-Speed 3038.99 samples/sec   Loss 9.2669   LearningRate 0.0399   Epoch: 7   Global Step: 91470   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:22:03,135-Speed 3058.18 samples/sec   Loss 9.0855   LearningRate 0.0399   Epoch: 7   Global Step: 91480   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:22:06,503-Speed 3041.57 samples/sec   Loss 9.2340   LearningRate 0.0399   Epoch: 7   Global Step: 91490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:22:09,856-Speed 3054.51 samples/sec   Loss 9.1951   LearningRate 0.0399   Epoch: 7   Global Step: 91500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:22:13,213-Speed 3051.42 samples/sec   Loss 9.1888   LearningRate 0.0399   Epoch: 7   Global Step: 91510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:22:16,669-Speed 2963.64 samples/sec   Loss 9.1226   LearningRate 0.0399   Epoch: 7   Global Step: 91520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:22:20,061-Speed 3020.11 samples/sec   Loss 9.0709   LearningRate 0.0399   Epoch: 7   Global Step: 91530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:22:23,500-Speed 2978.60 samples/sec   Loss 9.0868   LearningRate 0.0399   Epoch: 7   Global Step: 91540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:22:26,910-Speed 3002.97 samples/sec   Loss 9.1502   LearningRate 0.0399   Epoch: 7   Global Step: 91550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:22:30,295-Speed 3026.52 samples/sec   Loss 9.2627   LearningRate 0.0399   Epoch: 7   Global Step: 91560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:22:33,620-Speed 3080.25 samples/sec   Loss 9.0323   LearningRate 0.0399   Epoch: 7   Global Step: 91570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:22:36,934-Speed 3091.08 samples/sec   Loss 9.2442   LearningRate 0.0399   Epoch: 7   Global Step: 91580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:22:40,391-Speed 2962.72 samples/sec   Loss 9.1462   LearningRate 0.0399   Epoch: 7   Global Step: 91590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:22:43,768-Speed 3033.66 samples/sec   Loss 9.2292   LearningRate 0.0399   Epoch: 7   Global Step: 91600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:22:47,091-Speed 3081.83 samples/sec   Loss 9.1483   LearningRate 0.0398   Epoch: 7   Global Step: 91610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:22:50,523-Speed 2984.89 samples/sec   Loss 9.1846   LearningRate 0.0398   Epoch: 7   Global Step: 91620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:22:53,908-Speed 3025.91 samples/sec   Loss 8.9830   LearningRate 0.0398   Epoch: 7   Global Step: 91630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:22:57,284-Speed 3033.72 samples/sec   Loss 9.0999   LearningRate 0.0398   Epoch: 7   Global Step: 91640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:23:00,674-Speed 3021.54 samples/sec   Loss 9.2521   LearningRate 0.0398   Epoch: 7   Global Step: 91650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:23:04,101-Speed 2988.71 samples/sec   Loss 9.2596   LearningRate 0.0398   Epoch: 7   Global Step: 91660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:23:07,564-Speed 2958.26 samples/sec   Loss 9.1331   LearningRate 0.0398   Epoch: 7   Global Step: 91670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:23:10,927-Speed 3045.29 samples/sec   Loss 8.9684   LearningRate 0.0398   Epoch: 7   Global Step: 91680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:23:14,285-Speed 3050.41 samples/sec   Loss 9.1814   LearningRate 0.0398   Epoch: 7   Global Step: 91690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:23:17,679-Speed 3018.61 samples/sec   Loss 9.1515   LearningRate 0.0398   Epoch: 7   Global Step: 91700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:23:21,071-Speed 3019.95 samples/sec   Loss 9.0844   LearningRate 0.0398   Epoch: 7   Global Step: 91710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:23:24,574-Speed 2923.93 samples/sec   Loss 9.1215   LearningRate 0.0398   Epoch: 7   Global Step: 91720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:23:28,013-Speed 2977.68 samples/sec   Loss 9.0823   LearningRate 0.0398   Epoch: 7   Global Step: 91730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:23:31,445-Speed 2984.70 samples/sec   Loss 9.2279   LearningRate 0.0398   Epoch: 7   Global Step: 91740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:23:34,790-Speed 3061.96 samples/sec   Loss 9.3143   LearningRate 0.0398   Epoch: 7   Global Step: 91750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:23:38,150-Speed 3049.90 samples/sec   Loss 9.1482   LearningRate 0.0398   Epoch: 7   Global Step: 91760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:23:41,508-Speed 3050.39 samples/sec   Loss 9.2974   LearningRate 0.0398   Epoch: 7   Global Step: 91770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:23:44,955-Speed 2971.92 samples/sec   Loss 9.1688   LearningRate 0.0398   Epoch: 7   Global Step: 91780   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:23:48,351-Speed 3015.65 samples/sec   Loss 9.2636   LearningRate 0.0398   Epoch: 7   Global Step: 91790   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:23:51,776-Speed 2991.43 samples/sec   Loss 9.1739   LearningRate 0.0397   Epoch: 7   Global Step: 91800   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:23:55,121-Speed 3061.80 samples/sec   Loss 9.1269   LearningRate 0.0397   Epoch: 7   Global Step: 91810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:23:58,458-Speed 3070.21 samples/sec   Loss 9.3067   LearningRate 0.0397   Epoch: 7   Global Step: 91820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:01,885-Speed 2988.76 samples/sec   Loss 9.0654   LearningRate 0.0397   Epoch: 7   Global Step: 91830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:05,331-Speed 2972.37 samples/sec   Loss 9.1309   LearningRate 0.0397   Epoch: 7   Global Step: 91840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:08,660-Speed 3076.79 samples/sec   Loss 9.1515   LearningRate 0.0397   Epoch: 7   Global Step: 91850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:11,991-Speed 3075.06 samples/sec   Loss 9.0268   LearningRate 0.0397   Epoch: 7   Global Step: 91860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:15,429-Speed 2980.13 samples/sec   Loss 9.1884   LearningRate 0.0397   Epoch: 7   Global Step: 91870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:18,821-Speed 3018.84 samples/sec   Loss 9.1367   LearningRate 0.0397   Epoch: 7   Global Step: 91880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:22,217-Speed 3016.68 samples/sec   Loss 9.1771   LearningRate 0.0397   Epoch: 7   Global Step: 91890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:25,606-Speed 3022.40 samples/sec   Loss 9.2061   LearningRate 0.0397   Epoch: 7   Global Step: 91900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:28,972-Speed 3043.19 samples/sec   Loss 9.1661   LearningRate 0.0397   Epoch: 7   Global Step: 91910   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:24:32,394-Speed 2993.22 samples/sec   Loss 9.1425   LearningRate 0.0397   Epoch: 7   Global Step: 91920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:35,765-Speed 3039.29 samples/sec   Loss 9.0099   LearningRate 0.0397   Epoch: 7   Global Step: 91930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:39,199-Speed 2981.88 samples/sec   Loss 9.1978   LearningRate 0.0397   Epoch: 7   Global Step: 91940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:42,622-Speed 2993.07 samples/sec   Loss 9.1545   LearningRate 0.0397   Epoch: 7   Global Step: 91950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:45,977-Speed 3052.88 samples/sec   Loss 9.0647   LearningRate 0.0397   Epoch: 7   Global Step: 91960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:49,433-Speed 2964.05 samples/sec   Loss 9.1536   LearningRate 0.0397   Epoch: 7   Global Step: 91970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:52,919-Speed 2938.30 samples/sec   Loss 9.1747   LearningRate 0.0397   Epoch: 7   Global Step: 91980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:56,285-Speed 3042.94 samples/sec   Loss 9.1633   LearningRate 0.0397   Epoch: 7   Global Step: 91990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:24:59,645-Speed 3048.33 samples/sec   Loss 9.1516   LearningRate 0.0396   Epoch: 7   Global Step: 92000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:25:03,039-Speed 3018.40 samples/sec   Loss 9.1454   LearningRate 0.0396   Epoch: 7   Global Step: 92010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:25:06,379-Speed 3067.04 samples/sec   Loss 9.1392   LearningRate 0.0396   Epoch: 7   Global Step: 92020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:25:09,730-Speed 3056.44 samples/sec   Loss 9.1738   LearningRate 0.0396   Epoch: 7   Global Step: 92030   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:25:13,209-Speed 2943.69 samples/sec   Loss 9.1685   LearningRate 0.0396   Epoch: 7   Global Step: 92040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:25:16,569-Speed 3048.81 samples/sec   Loss 9.2109   LearningRate 0.0396   Epoch: 7   Global Step: 92050   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:25:19,923-Speed 3054.21 samples/sec   Loss 9.2740   LearningRate 0.0396   Epoch: 7   Global Step: 92060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:25:23,294-Speed 3038.23 samples/sec   Loss 9.2095   LearningRate 0.0396   Epoch: 7   Global Step: 92070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:25:26,630-Speed 3070.33 samples/sec   Loss 9.2236   LearningRate 0.0396   Epoch: 7   Global Step: 92080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:25:29,956-Speed 3079.62 samples/sec   Loss 9.1149   LearningRate 0.0396   Epoch: 7   Global Step: 92090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:25:33,308-Speed 3055.63 samples/sec   Loss 9.1290   LearningRate 0.0396   Epoch: 7   Global Step: 92100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:25:36,715-Speed 3006.93 samples/sec   Loss 8.9805   LearningRate 0.0396   Epoch: 7   Global Step: 92110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:25:40,072-Speed 3051.07 samples/sec   Loss 9.1512   LearningRate 0.0396   Epoch: 7   Global Step: 92120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:25:43,463-Speed 3020.37 samples/sec   Loss 9.0936   LearningRate 0.0396   Epoch: 7   Global Step: 92130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:25:46,826-Speed 3046.79 samples/sec   Loss 9.0041   LearningRate 0.0396   Epoch: 7   Global Step: 92140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:25:50,230-Speed 3009.00 samples/sec   Loss 9.1959   LearningRate 0.0396   Epoch: 7   Global Step: 92150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:25:53,612-Speed 3029.20 samples/sec   Loss 9.0799   LearningRate 0.0396   Epoch: 7   Global Step: 92160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:25:56,965-Speed 3054.08 samples/sec   Loss 9.1777   LearningRate 0.0396   Epoch: 7   Global Step: 92170   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:26:00,276-Speed 3094.00 samples/sec   Loss 9.0644   LearningRate 0.0396   Epoch: 7   Global Step: 92180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:03,638-Speed 3046.39 samples/sec   Loss 9.1406   LearningRate 0.0396   Epoch: 7   Global Step: 92190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:07,072-Speed 2982.76 samples/sec   Loss 9.1789   LearningRate 0.0395   Epoch: 7   Global Step: 92200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:10,486-Speed 3000.75 samples/sec   Loss 9.0648   LearningRate 0.0395   Epoch: 7   Global Step: 92210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:13,893-Speed 3006.35 samples/sec   Loss 9.1909   LearningRate 0.0395   Epoch: 7   Global Step: 92220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:17,302-Speed 3004.41 samples/sec   Loss 8.9778   LearningRate 0.0395   Epoch: 7   Global Step: 92230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:20,693-Speed 3020.93 samples/sec   Loss 9.0777   LearningRate 0.0395   Epoch: 7   Global Step: 92240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:24,065-Speed 3037.77 samples/sec   Loss 9.1611   LearningRate 0.0395   Epoch: 7   Global Step: 92250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:27,451-Speed 3025.10 samples/sec   Loss 9.1239   LearningRate 0.0395   Epoch: 7   Global Step: 92260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:30,936-Speed 2939.07 samples/sec   Loss 9.1467   LearningRate 0.0395   Epoch: 7   Global Step: 92270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:34,315-Speed 3031.93 samples/sec   Loss 8.9850   LearningRate 0.0395   Epoch: 7   Global Step: 92280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:37,685-Speed 3039.57 samples/sec   Loss 9.1011   LearningRate 0.0395   Epoch: 7   Global Step: 92290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:41,037-Speed 3055.73 samples/sec   Loss 9.0974   LearningRate 0.0395   Epoch: 7   Global Step: 92300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:44,425-Speed 3023.64 samples/sec   Loss 9.0560   LearningRate 0.0395   Epoch: 7   Global Step: 92310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:47,890-Speed 2955.84 samples/sec   Loss 9.0592   LearningRate 0.0395   Epoch: 7   Global Step: 92320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:51,208-Speed 3087.34 samples/sec   Loss 9.1497   LearningRate 0.0395   Epoch: 7   Global Step: 92330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:54,578-Speed 3039.92 samples/sec   Loss 9.1933   LearningRate 0.0395   Epoch: 7   Global Step: 92340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:26:57,942-Speed 3044.93 samples/sec   Loss 9.2674   LearningRate 0.0395   Epoch: 7   Global Step: 92350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:27:01,332-Speed 3020.90 samples/sec   Loss 9.0248   LearningRate 0.0395   Epoch: 7   Global Step: 92360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:27:04,808-Speed 2947.40 samples/sec   Loss 9.2145   LearningRate 0.0395   Epoch: 7   Global Step: 92370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:27:08,160-Speed 3055.68 samples/sec   Loss 9.3523   LearningRate 0.0395   Epoch: 7   Global Step: 92380   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:27:11,584-Speed 2990.50 samples/sec   Loss 9.2008   LearningRate 0.0394   Epoch: 7   Global Step: 92390   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:27:14,941-Speed 3052.32 samples/sec   Loss 9.1394   LearningRate 0.0394   Epoch: 7   Global Step: 92400   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:27:18,266-Speed 3080.34 samples/sec   Loss 9.2517   LearningRate 0.0394   Epoch: 7   Global Step: 92410   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:27:21,624-Speed 3050.11 samples/sec   Loss 9.1648   LearningRate 0.0394   Epoch: 7   Global Step: 92420   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:27:24,978-Speed 3053.86 samples/sec   Loss 9.1984   LearningRate 0.0394   Epoch: 7   Global Step: 92430   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:27:28,291-Speed 3091.89 samples/sec   Loss 9.0986   LearningRate 0.0394   Epoch: 7   Global Step: 92440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:27:31,754-Speed 2958.16 samples/sec   Loss 9.0944   LearningRate 0.0394   Epoch: 7   Global Step: 92450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:27:35,061-Speed 3096.61 samples/sec   Loss 9.2520   LearningRate 0.0394   Epoch: 7   Global Step: 92460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:27:38,390-Speed 3077.17 samples/sec   Loss 9.1555   LearningRate 0.0394   Epoch: 7   Global Step: 92470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:27:41,751-Speed 3048.31 samples/sec   Loss 9.2865   LearningRate 0.0394   Epoch: 7   Global Step: 92480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:27:45,137-Speed 3024.92 samples/sec   Loss 9.1492   LearningRate 0.0394   Epoch: 7   Global Step: 92490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:27:48,517-Speed 3030.22 samples/sec   Loss 9.1260   LearningRate 0.0394   Epoch: 7   Global Step: 92500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:27:51,912-Speed 3016.97 samples/sec   Loss 9.1048   LearningRate 0.0394   Epoch: 7   Global Step: 92510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:27:55,253-Speed 3066.52 samples/sec   Loss 9.0633   LearningRate 0.0394   Epoch: 7   Global Step: 92520   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:27:58,579-Speed 3079.56 samples/sec   Loss 9.0958   LearningRate 0.0394   Epoch: 7   Global Step: 92530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:28:01,982-Speed 3009.52 samples/sec   Loss 9.1255   LearningRate 0.0394   Epoch: 7   Global Step: 92540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:28:05,461-Speed 2944.82 samples/sec   Loss 9.2521   LearningRate 0.0394   Epoch: 7   Global Step: 92550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:08,881-Speed 2995.43 samples/sec   Loss 9.2494   LearningRate 0.0394   Epoch: 7   Global Step: 92560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:12,279-Speed 3013.60 samples/sec   Loss 9.1877   LearningRate 0.0394   Epoch: 7   Global Step: 92570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:15,623-Speed 3063.25 samples/sec   Loss 9.1315   LearningRate 0.0394   Epoch: 7   Global Step: 92580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:18,978-Speed 3053.72 samples/sec   Loss 9.0946   LearningRate 0.0393   Epoch: 7   Global Step: 92590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:22,351-Speed 3036.08 samples/sec   Loss 9.2183   LearningRate 0.0393   Epoch: 7   Global Step: 92600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:25,810-Speed 2961.17 samples/sec   Loss 9.1244   LearningRate 0.0393   Epoch: 7   Global Step: 92610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:29,160-Speed 3057.93 samples/sec   Loss 9.2476   LearningRate 0.0393   Epoch: 7   Global Step: 92620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:32,543-Speed 3028.18 samples/sec   Loss 9.0062   LearningRate 0.0393   Epoch: 7   Global Step: 92630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:35,900-Speed 3051.25 samples/sec   Loss 9.1342   LearningRate 0.0393   Epoch: 7   Global Step: 92640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:39,255-Speed 3052.81 samples/sec   Loss 9.0894   LearningRate 0.0393   Epoch: 7   Global Step: 92650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:28:42,578-Speed 3081.96 samples/sec   Loss 9.2032   LearningRate 0.0393   Epoch: 7   Global Step: 92660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:46,016-Speed 2979.67 samples/sec   Loss 9.0718   LearningRate 0.0393   Epoch: 7   Global Step: 92670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:49,366-Speed 3057.37 samples/sec   Loss 9.2426   LearningRate 0.0393   Epoch: 7   Global Step: 92680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:52,792-Speed 2989.87 samples/sec   Loss 9.0153   LearningRate 0.0393   Epoch: 7   Global Step: 92690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:56,178-Speed 3025.38 samples/sec   Loss 9.1030   LearningRate 0.0393   Epoch: 7   Global Step: 92700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:28:59,497-Speed 3085.84 samples/sec   Loss 9.1377   LearningRate 0.0393   Epoch: 7   Global Step: 92710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:29:02,913-Speed 2998.75 samples/sec   Loss 9.1433   LearningRate 0.0393   Epoch: 7   Global Step: 92720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:29:06,333-Speed 2995.05 samples/sec   Loss 9.1244   LearningRate 0.0393   Epoch: 7   Global Step: 92730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:29:09,739-Speed 3007.32 samples/sec   Loss 9.1748   LearningRate 0.0393   Epoch: 7   Global Step: 92740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:29:13,054-Speed 3090.09 samples/sec   Loss 9.1849   LearningRate 0.0393   Epoch: 7   Global Step: 92750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:29:16,495-Speed 2976.37 samples/sec   Loss 9.2155   LearningRate 0.0393   Epoch: 7   Global Step: 92760   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:29:19,893-Speed 3014.32 samples/sec   Loss 9.2942   LearningRate 0.0393   Epoch: 7   Global Step: 92770   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:29:23,242-Speed 3058.57 samples/sec   Loss 9.1171   LearningRate 0.0393   Epoch: 7   Global Step: 92780   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:29:26,690-Speed 2970.44 samples/sec   Loss 9.1296   LearningRate 0.0392   Epoch: 7   Global Step: 92790   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:29:29,995-Speed 3099.13 samples/sec   Loss 9.1198   LearningRate 0.0392   Epoch: 7   Global Step: 92800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:29:33,333-Speed 3069.01 samples/sec   Loss 9.1266   LearningRate 0.0392   Epoch: 7   Global Step: 92810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:29:36,807-Speed 2948.84 samples/sec   Loss 9.0528   LearningRate 0.0392   Epoch: 7   Global Step: 92820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:29:40,301-Speed 2931.41 samples/sec   Loss 9.0535   LearningRate 0.0392   Epoch: 7   Global Step: 92830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:29:43,776-Speed 2947.27 samples/sec   Loss 8.9605   LearningRate 0.0392   Epoch: 7   Global Step: 92840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:29:47,222-Speed 2972.99 samples/sec   Loss 9.2382   LearningRate 0.0392   Epoch: 7   Global Step: 92850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:29:50,664-Speed 2975.33 samples/sec   Loss 9.1902   LearningRate 0.0392   Epoch: 7   Global Step: 92860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:29:54,086-Speed 2993.88 samples/sec   Loss 9.1115   LearningRate 0.0392   Epoch: 7   Global Step: 92870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:29:57,491-Speed 3008.18 samples/sec   Loss 9.1018   LearningRate 0.0392   Epoch: 7   Global Step: 92880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:30:00,853-Speed 3046.57 samples/sec   Loss 9.1963   LearningRate 0.0392   Epoch: 7   Global Step: 92890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:30:04,310-Speed 2963.41 samples/sec   Loss 9.0929   LearningRate 0.0392   Epoch: 7   Global Step: 92900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:30:07,705-Speed 3016.53 samples/sec   Loss 9.1630   LearningRate 0.0392   Epoch: 7   Global Step: 92910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:30:11,188-Speed 2940.94 samples/sec   Loss 9.1327   LearningRate 0.0392   Epoch: 7   Global Step: 92920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:30:14,635-Speed 2971.49 samples/sec   Loss 9.0990   LearningRate 0.0392   Epoch: 7   Global Step: 92930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:30:17,978-Speed 3064.36 samples/sec   Loss 9.1975   LearningRate 0.0392   Epoch: 7   Global Step: 92940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:30:21,426-Speed 2970.40 samples/sec   Loss 9.0981   LearningRate 0.0392   Epoch: 7   Global Step: 92950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:30:24,786-Speed 3049.06 samples/sec   Loss 9.0719   LearningRate 0.0392   Epoch: 7   Global Step: 92960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:30:28,127-Speed 3065.31 samples/sec   Loss 9.0974   LearningRate 0.0392   Epoch: 7   Global Step: 92970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:30:31,538-Speed 3003.31 samples/sec   Loss 9.2131   LearningRate 0.0392   Epoch: 7   Global Step: 92980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:30:35,008-Speed 2951.96 samples/sec   Loss 9.0995   LearningRate 0.0391   Epoch: 7   Global Step: 92990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:30:38,381-Speed 3036.63 samples/sec   Loss 9.2087   LearningRate 0.0391   Epoch: 7   Global Step: 93000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:30:41,774-Speed 3018.39 samples/sec   Loss 9.2104   LearningRate 0.0391   Epoch: 7   Global Step: 93010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:30:45,185-Speed 3003.36 samples/sec   Loss 9.0020   LearningRate 0.0391   Epoch: 7   Global Step: 93020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:30:48,570-Speed 3025.21 samples/sec   Loss 9.0372   LearningRate 0.0391   Epoch: 7   Global Step: 93030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:30:52,043-Speed 2949.75 samples/sec   Loss 9.1835   LearningRate 0.0391   Epoch: 7   Global Step: 93040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:30:55,403-Speed 3048.51 samples/sec   Loss 9.1082   LearningRate 0.0391   Epoch: 7   Global Step: 93050   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:30:58,859-Speed 2963.63 samples/sec   Loss 9.0743   LearningRate 0.0391   Epoch: 7   Global Step: 93060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:31:02,191-Speed 3074.18 samples/sec   Loss 9.1870   LearningRate 0.0391   Epoch: 7   Global Step: 93070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:31:05,572-Speed 3030.00 samples/sec   Loss 9.1448   LearningRate 0.0391   Epoch: 7   Global Step: 93080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:31:08,926-Speed 3053.63 samples/sec   Loss 9.0041   LearningRate 0.0391   Epoch: 7   Global Step: 93090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:31:12,273-Speed 3060.43 samples/sec   Loss 9.2301   LearningRate 0.0391   Epoch: 7   Global Step: 93100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:31:15,633-Speed 3048.25 samples/sec   Loss 9.0950   LearningRate 0.0391   Epoch: 7   Global Step: 93110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:31:19,155-Speed 2907.88 samples/sec   Loss 9.1434   LearningRate 0.0391   Epoch: 7   Global Step: 93120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:31:22,665-Speed 2918.27 samples/sec   Loss 9.1937   LearningRate 0.0391   Epoch: 7   Global Step: 93130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:31:26,164-Speed 2928.12 samples/sec   Loss 9.1677   LearningRate 0.0391   Epoch: 7   Global Step: 93140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:31:29,657-Speed 2931.72 samples/sec   Loss 9.2375   LearningRate 0.0391   Epoch: 7   Global Step: 93150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:31:33,032-Speed 3035.26 samples/sec   Loss 9.1377   LearningRate 0.0391   Epoch: 7   Global Step: 93160   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:31:36,428-Speed 3016.30 samples/sec   Loss 9.0438   LearningRate 0.0391   Epoch: 7   Global Step: 93170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:31:39,864-Speed 2980.78 samples/sec   Loss 9.0954   LearningRate 0.0391   Epoch: 7   Global Step: 93180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:31:43,261-Speed 3014.99 samples/sec   Loss 9.0984   LearningRate 0.0390   Epoch: 7   Global Step: 93190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:31:46,678-Speed 2998.09 samples/sec   Loss 9.0707   LearningRate 0.0390   Epoch: 7   Global Step: 93200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:31:50,113-Speed 2982.13 samples/sec   Loss 9.1183   LearningRate 0.0390   Epoch: 7   Global Step: 93210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:31:53,521-Speed 3004.97 samples/sec   Loss 9.2240   LearningRate 0.0390   Epoch: 7   Global Step: 93220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:31:56,919-Speed 3014.20 samples/sec   Loss 9.1270   LearningRate 0.0390   Epoch: 7   Global Step: 93230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:32:00,390-Speed 2951.87 samples/sec   Loss 9.0673   LearningRate 0.0390   Epoch: 7   Global Step: 93240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:32:03,774-Speed 3027.08 samples/sec   Loss 9.1012   LearningRate 0.0390   Epoch: 7   Global Step: 93250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:32:07,233-Speed 2960.85 samples/sec   Loss 8.9304   LearningRate 0.0390   Epoch: 7   Global Step: 93260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:32:10,603-Speed 3039.87 samples/sec   Loss 9.1549   LearningRate 0.0390   Epoch: 7   Global Step: 93270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:32:14,058-Speed 2964.37 samples/sec   Loss 9.2380   LearningRate 0.0390   Epoch: 7   Global Step: 93280   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:32:17,479-Speed 2994.09 samples/sec   Loss 9.0065   LearningRate 0.0390   Epoch: 7   Global Step: 93290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:32:20,896-Speed 2997.50 samples/sec   Loss 8.9620   LearningRate 0.0390   Epoch: 7   Global Step: 93300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:32:24,371-Speed 2947.78 samples/sec   Loss 9.1324   LearningRate 0.0390   Epoch: 7   Global Step: 93310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:32:27,762-Speed 3020.70 samples/sec   Loss 9.1725   LearningRate 0.0390   Epoch: 7   Global Step: 93320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:32:31,067-Speed 3099.24 samples/sec   Loss 9.1099   LearningRate 0.0390   Epoch: 7   Global Step: 93330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:32:34,511-Speed 2974.59 samples/sec   Loss 9.0156   LearningRate 0.0390   Epoch: 7   Global Step: 93340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:32:37,858-Speed 3060.11 samples/sec   Loss 9.2039   LearningRate 0.0390   Epoch: 7   Global Step: 93350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:32:41,295-Speed 2979.62 samples/sec   Loss 9.0942   LearningRate 0.0390   Epoch: 7   Global Step: 93360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:32:44,685-Speed 3021.33 samples/sec   Loss 9.0807   LearningRate 0.0390   Epoch: 7   Global Step: 93370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:32:48,165-Speed 2943.41 samples/sec   Loss 9.1144   LearningRate 0.0390   Epoch: 7   Global Step: 93380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:32:51,549-Speed 3026.66 samples/sec   Loss 9.1260   LearningRate 0.0389   Epoch: 7   Global Step: 93390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:32:54,910-Speed 3048.19 samples/sec   Loss 9.0993   LearningRate 0.0389   Epoch: 7   Global Step: 93400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:32:58,256-Speed 3060.45 samples/sec   Loss 9.1415   LearningRate 0.0389   Epoch: 7   Global Step: 93410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:33:01,678-Speed 2993.40 samples/sec   Loss 9.0866   LearningRate 0.0389   Epoch: 7   Global Step: 93420   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:33:05,044-Speed 3043.36 samples/sec   Loss 9.0557   LearningRate 0.0389   Epoch: 7   Global Step: 93430   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:33:08,392-Speed 3059.04 samples/sec   Loss 9.1366   LearningRate 0.0389   Epoch: 7   Global Step: 93440   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:33:11,813-Speed 2994.41 samples/sec   Loss 9.0954   LearningRate 0.0389   Epoch: 7   Global Step: 93450   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:33:15,234-Speed 2994.39 samples/sec   Loss 9.2174   LearningRate 0.0389   Epoch: 7   Global Step: 93460   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:33:18,697-Speed 2957.86 samples/sec   Loss 9.0233   LearningRate 0.0389   Epoch: 7   Global Step: 93470   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:33:22,088-Speed 3020.04 samples/sec   Loss 9.1072   LearningRate 0.0389   Epoch: 7   Global Step: 93480   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:33:25,525-Speed 2980.46 samples/sec   Loss 9.0992   LearningRate 0.0389   Epoch: 7   Global Step: 93490   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:33:28,957-Speed 2985.04 samples/sec   Loss 9.1474   LearningRate 0.0389   Epoch: 7   Global Step: 93500   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:33:32,297-Speed 3066.77 samples/sec   Loss 9.1485   LearningRate 0.0389   Epoch: 7   Global Step: 93510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:33:35,712-Speed 2999.88 samples/sec   Loss 9.0463   LearningRate 0.0389   Epoch: 7   Global Step: 93520   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:33:39,138-Speed 2989.84 samples/sec   Loss 9.0660   LearningRate 0.0389   Epoch: 7   Global Step: 93530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:33:42,500-Speed 3046.04 samples/sec   Loss 9.0353   LearningRate 0.0389   Epoch: 7   Global Step: 93540   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:33:45,993-Speed 2932.81 samples/sec   Loss 9.1739   LearningRate 0.0389   Epoch: 7   Global Step: 93550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:33:49,414-Speed 2994.02 samples/sec   Loss 9.1664   LearningRate 0.0389   Epoch: 7   Global Step: 93560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:33:52,855-Speed 2976.73 samples/sec   Loss 9.0704   LearningRate 0.0389   Epoch: 7   Global Step: 93570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:33:56,250-Speed 3017.39 samples/sec   Loss 9.0066   LearningRate 0.0389   Epoch: 7   Global Step: 93580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:33:59,652-Speed 3010.71 samples/sec   Loss 9.0633   LearningRate 0.0388   Epoch: 7   Global Step: 93590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:34:03,120-Speed 2953.51 samples/sec   Loss 9.0086   LearningRate 0.0388   Epoch: 7   Global Step: 93600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:34:06,555-Speed 2982.38 samples/sec   Loss 9.0033   LearningRate 0.0388   Epoch: 7   Global Step: 93610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:34:09,977-Speed 2993.09 samples/sec   Loss 9.1273   LearningRate 0.0388   Epoch: 7   Global Step: 93620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:34:13,373-Speed 3016.24 samples/sec   Loss 9.0647   LearningRate 0.0388   Epoch: 7   Global Step: 93630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:34:16,766-Speed 3018.79 samples/sec   Loss 8.9985   LearningRate 0.0388   Epoch: 7   Global Step: 93640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:34:20,135-Speed 3039.87 samples/sec   Loss 9.0171   LearningRate 0.0388   Epoch: 7   Global Step: 93650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:34:23,457-Speed 3084.05 samples/sec   Loss 9.1409   LearningRate 0.0388   Epoch: 7   Global Step: 93660   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:34:26,832-Speed 3034.32 samples/sec   Loss 9.1382   LearningRate 0.0388   Epoch: 7   Global Step: 93670   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:34:30,246-Speed 3000.67 samples/sec   Loss 9.0730   LearningRate 0.0388   Epoch: 7   Global Step: 93680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:34:33,582-Speed 3070.09 samples/sec   Loss 9.1795   LearningRate 0.0388   Epoch: 7   Global Step: 93690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:34:36,928-Speed 3060.86 samples/sec   Loss 9.1822   LearningRate 0.0388   Epoch: 7   Global Step: 93700   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:34:40,291-Speed 3046.67 samples/sec   Loss 9.1336   LearningRate 0.0388   Epoch: 7   Global Step: 93710   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:34:43,648-Speed 3050.91 samples/sec   Loss 8.9611   LearningRate 0.0388   Epoch: 7   Global Step: 93720   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:34:46,997-Speed 3058.03 samples/sec   Loss 9.1392   LearningRate 0.0388   Epoch: 7   Global Step: 93730   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:34:50,353-Speed 3052.50 samples/sec   Loss 9.0880   LearningRate 0.0388   Epoch: 7   Global Step: 93740   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:34:53,776-Speed 2992.24 samples/sec   Loss 9.1068   LearningRate 0.0388   Epoch: 7   Global Step: 93750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:34:57,184-Speed 3005.47 samples/sec   Loss 9.1039   LearningRate 0.0388   Epoch: 7   Global Step: 93760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:35:00,536-Speed 3056.48 samples/sec   Loss 9.2326   LearningRate 0.0388   Epoch: 7   Global Step: 93770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:35:03,962-Speed 2990.07 samples/sec   Loss 9.1298   LearningRate 0.0387   Epoch: 7   Global Step: 93780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:35:07,405-Speed 2974.46 samples/sec   Loss 9.0621   LearningRate 0.0387   Epoch: 7   Global Step: 93790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:35:10,834-Speed 2986.72 samples/sec   Loss 9.0674   LearningRate 0.0387   Epoch: 7   Global Step: 93800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:35:14,314-Speed 2944.05 samples/sec   Loss 9.1100   LearningRate 0.0387   Epoch: 7   Global Step: 93810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:35:17,780-Speed 2954.92 samples/sec   Loss 9.0283   LearningRate 0.0387   Epoch: 7   Global Step: 93820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:35:21,122-Speed 3065.42 samples/sec   Loss 9.1766   LearningRate 0.0387   Epoch: 7   Global Step: 93830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:35:24,569-Speed 2971.43 samples/sec   Loss 9.1073   LearningRate 0.0387   Epoch: 7   Global Step: 93840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:35:27,980-Speed 3002.40 samples/sec   Loss 9.0935   LearningRate 0.0387   Epoch: 7   Global Step: 93850   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:35:31,301-Speed 3084.79 samples/sec   Loss 9.0103   LearningRate 0.0387   Epoch: 7   Global Step: 93860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:35:34,677-Speed 3034.00 samples/sec   Loss 9.0700   LearningRate 0.0387   Epoch: 7   Global Step: 93870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:35:38,062-Speed 3025.87 samples/sec   Loss 9.0752   LearningRate 0.0387   Epoch: 7   Global Step: 93880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:35:41,405-Speed 3064.70 samples/sec   Loss 9.1399   LearningRate 0.0387   Epoch: 7   Global Step: 93890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:35:44,811-Speed 3007.78 samples/sec   Loss 9.0355   LearningRate 0.0387   Epoch: 7   Global Step: 93900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:35:48,206-Speed 3016.61 samples/sec   Loss 9.1584   LearningRate 0.0387   Epoch: 7   Global Step: 93910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:35:51,602-Speed 3016.73 samples/sec   Loss 8.9869   LearningRate 0.0387   Epoch: 7   Global Step: 93920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:35:54,970-Speed 3040.98 samples/sec   Loss 9.1106   LearningRate 0.0387   Epoch: 7   Global Step: 93930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:35:58,349-Speed 3031.80 samples/sec   Loss 9.1174   LearningRate 0.0387   Epoch: 7   Global Step: 93940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:36:01,792-Speed 2975.05 samples/sec   Loss 9.1257   LearningRate 0.0387   Epoch: 7   Global Step: 93950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 10:36:05,175-Speed 3027.92 samples/sec   Loss 9.0662   LearningRate 0.0387   Epoch: 7   Global Step: 93960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:36:08,643-Speed 2954.14 samples/sec   Loss 9.1535   LearningRate 0.0387   Epoch: 7   Global Step: 93970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:36:12,104-Speed 2959.17 samples/sec   Loss 8.9671   LearningRate 0.0386   Epoch: 7   Global Step: 93980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:36:15,492-Speed 3023.48 samples/sec   Loss 9.0323   LearningRate 0.0386   Epoch: 7   Global Step: 93990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:36:18,927-Speed 2981.30 samples/sec   Loss 9.0353   LearningRate 0.0386   Epoch: 7   Global Step: 94000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:36:22,287-Speed 3048.44 samples/sec   Loss 9.0450   LearningRate 0.0386   Epoch: 7   Global Step: 94010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:36:25,673-Speed 3025.28 samples/sec   Loss 8.9127   LearningRate 0.0386   Epoch: 7   Global Step: 94020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:36:29,098-Speed 2990.75 samples/sec   Loss 9.0529   LearningRate 0.0386   Epoch: 7   Global Step: 94030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:36:32,555-Speed 2962.67 samples/sec   Loss 9.1057   LearningRate 0.0386   Epoch: 7   Global Step: 94040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:36:35,988-Speed 2983.91 samples/sec   Loss 8.9104   LearningRate 0.0386   Epoch: 7   Global Step: 94050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 10:36:39,439-Speed 2968.19 samples/sec   Loss 9.1238   LearningRate 0.0386   Epoch: 7   Global Step: 94060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:36:42,922-Speed 2940.81 samples/sec   Loss 9.1491   LearningRate 0.0386   Epoch: 7   Global Step: 94070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:36:46,371-Speed 2969.50 samples/sec   Loss 9.1184   LearningRate 0.0386   Epoch: 7   Global Step: 94080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:36:49,864-Speed 2932.16 samples/sec   Loss 9.1662   LearningRate 0.0386   Epoch: 7   Global Step: 94090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:36:53,241-Speed 3034.01 samples/sec   Loss 9.1225   LearningRate 0.0386   Epoch: 7   Global Step: 94100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:36:56,713-Speed 2949.88 samples/sec   Loss 8.9696   LearningRate 0.0386   Epoch: 7   Global Step: 94110   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:37:00,132-Speed 2996.35 samples/sec   Loss 9.0880   LearningRate 0.0386   Epoch: 7   Global Step: 94120   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:37:03,470-Speed 3068.51 samples/sec   Loss 9.0866   LearningRate 0.0386   Epoch: 7   Global Step: 94130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:37:06,909-Speed 2978.67 samples/sec   Loss 9.1557   LearningRate 0.0386   Epoch: 7   Global Step: 94140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:37:10,315-Speed 3007.21 samples/sec   Loss 9.1439   LearningRate 0.0386   Epoch: 7   Global Step: 94150   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:37:13,781-Speed 2955.24 samples/sec   Loss 9.0435   LearningRate 0.0386   Epoch: 7   Global Step: 94160   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:37:17,157-Speed 3034.20 samples/sec   Loss 9.1770   LearningRate 0.0386   Epoch: 7   Global Step: 94170   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:37:20,532-Speed 3035.04 samples/sec   Loss 9.0096   LearningRate 0.0385   Epoch: 7   Global Step: 94180   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:37:24,028-Speed 2929.77 samples/sec   Loss 9.1201   LearningRate 0.0385   Epoch: 7   Global Step: 94190   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:37:27,439-Speed 3005.84 samples/sec   Loss 9.0338   LearningRate 0.0385   Epoch: 7   Global Step: 94200   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:37:30,819-Speed 3031.33 samples/sec   Loss 9.1763   LearningRate 0.0385   Epoch: 7   Global Step: 94210   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 10:37:34,251-Speed 2984.19 samples/sec   Loss 9.1406   LearningRate 0.0385   Epoch: 7   Global Step: 94220   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:37:37,673-Speed 2993.76 samples/sec   Loss 9.1738   LearningRate 0.0385   Epoch: 7   Global Step: 94230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:37:41,038-Speed 3043.38 samples/sec   Loss 8.9329   LearningRate 0.0385   Epoch: 7   Global Step: 94240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:37:44,545-Speed 2920.57 samples/sec   Loss 9.0980   LearningRate 0.0385   Epoch: 7   Global Step: 94250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:37:48,018-Speed 2949.99 samples/sec   Loss 9.0263   LearningRate 0.0385   Epoch: 7   Global Step: 94260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:37:51,417-Speed 3012.79 samples/sec   Loss 9.1357   LearningRate 0.0385   Epoch: 7   Global Step: 94270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:37:54,831-Speed 2999.84 samples/sec   Loss 9.1105   LearningRate 0.0385   Epoch: 7   Global Step: 94280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:37:58,218-Speed 3025.24 samples/sec   Loss 9.0531   LearningRate 0.0385   Epoch: 7   Global Step: 94290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:38:01,682-Speed 2957.01 samples/sec   Loss 9.1847   LearningRate 0.0385   Epoch: 7   Global Step: 94300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:38:05,111-Speed 2986.56 samples/sec   Loss 9.0214   LearningRate 0.0385   Epoch: 7   Global Step: 94310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:38:08,547-Speed 2981.11 samples/sec   Loss 8.9461   LearningRate 0.0385   Epoch: 7   Global Step: 94320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:38:11,897-Speed 3057.76 samples/sec   Loss 9.0130   LearningRate 0.0385   Epoch: 7   Global Step: 94330   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:38:15,283-Speed 3024.76 samples/sec   Loss 8.9391   LearningRate 0.0385   Epoch: 7   Global Step: 94340   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:38:18,682-Speed 3013.59 samples/sec   Loss 9.0081   LearningRate 0.0385   Epoch: 7   Global Step: 94350   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:38:22,045-Speed 3046.33 samples/sec   Loss 9.0310   LearningRate 0.0385   Epoch: 7   Global Step: 94360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:38:25,436-Speed 3020.15 samples/sec   Loss 9.0724   LearningRate 0.0385   Epoch: 7   Global Step: 94370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:38:28,847-Speed 3003.24 samples/sec   Loss 9.1009   LearningRate 0.0384   Epoch: 7   Global Step: 94380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:38:32,264-Speed 2997.59 samples/sec   Loss 9.1198   LearningRate 0.0384   Epoch: 7   Global Step: 94390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:38:35,670-Speed 3007.62 samples/sec   Loss 9.0412   LearningRate 0.0384   Epoch: 7   Global Step: 94400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:38:39,041-Speed 3038.22 samples/sec   Loss 9.0532   LearningRate 0.0384   Epoch: 7   Global Step: 94410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:38:42,357-Speed 3089.59 samples/sec   Loss 9.0688   LearningRate 0.0384   Epoch: 7   Global Step: 94420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:38:45,827-Speed 2951.34 samples/sec   Loss 8.9770   LearningRate 0.0384   Epoch: 7   Global Step: 94430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:38:49,220-Speed 3018.78 samples/sec   Loss 9.0130   LearningRate 0.0384   Epoch: 7   Global Step: 94440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:38:52,635-Speed 2999.53 samples/sec   Loss 9.0987   LearningRate 0.0384   Epoch: 7   Global Step: 94450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:38:56,008-Speed 3037.10 samples/sec   Loss 9.1258   LearningRate 0.0384   Epoch: 7   Global Step: 94460   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:38:59,449-Speed 2976.54 samples/sec   Loss 9.1073   LearningRate 0.0384   Epoch: 7   Global Step: 94470   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:39:02,804-Speed 3053.65 samples/sec   Loss 9.0276   LearningRate 0.0384   Epoch: 7   Global Step: 94480   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:39:06,148-Speed 3063.17 samples/sec   Loss 9.1194   LearningRate 0.0384   Epoch: 7   Global Step: 94490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:39:09,667-Speed 2910.56 samples/sec   Loss 9.0589   LearningRate 0.0384   Epoch: 7   Global Step: 94500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:39:13,000-Speed 3072.90 samples/sec   Loss 9.0308   LearningRate 0.0384   Epoch: 7   Global Step: 94510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:39:16,377-Speed 3033.40 samples/sec   Loss 8.9862   LearningRate 0.0384   Epoch: 7   Global Step: 94520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:39:19,749-Speed 3037.63 samples/sec   Loss 9.0547   LearningRate 0.0384   Epoch: 7   Global Step: 94530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:39:23,130-Speed 3028.91 samples/sec   Loss 9.1080   LearningRate 0.0384   Epoch: 7   Global Step: 94540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:39:26,612-Speed 2941.85 samples/sec   Loss 8.9770   LearningRate 0.0384   Epoch: 7   Global Step: 94550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:39:30,070-Speed 2962.37 samples/sec   Loss 9.0986   LearningRate 0.0384   Epoch: 7   Global Step: 94560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:39:33,421-Speed 3056.24 samples/sec   Loss 8.9825   LearningRate 0.0384   Epoch: 7   Global Step: 94570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:39:36,841-Speed 2995.43 samples/sec   Loss 9.0588   LearningRate 0.0384   Epoch: 7   Global Step: 94580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:39:40,275-Speed 2983.15 samples/sec   Loss 9.0232   LearningRate 0.0383   Epoch: 7   Global Step: 94590   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:39:43,741-Speed 2955.62 samples/sec   Loss 9.0064   LearningRate 0.0383   Epoch: 7   Global Step: 94600   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:39:47,120-Speed 3031.41 samples/sec   Loss 8.9790   LearningRate 0.0383   Epoch: 7   Global Step: 94610   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:39:50,558-Speed 2979.06 samples/sec   Loss 8.9276   LearningRate 0.0383   Epoch: 7   Global Step: 94620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:39:53,986-Speed 2987.38 samples/sec   Loss 9.0510   LearningRate 0.0383   Epoch: 7   Global Step: 94630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:39:57,390-Speed 3009.83 samples/sec   Loss 9.0849   LearningRate 0.0383   Epoch: 7   Global Step: 94640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:40:00,854-Speed 2956.91 samples/sec   Loss 9.1913   LearningRate 0.0383   Epoch: 7   Global Step: 94650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:40:04,248-Speed 3017.90 samples/sec   Loss 9.1780   LearningRate 0.0383   Epoch: 7   Global Step: 94660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:40:07,745-Speed 2929.36 samples/sec   Loss 9.1287   LearningRate 0.0383   Epoch: 7   Global Step: 94670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:40:11,167-Speed 2992.78 samples/sec   Loss 8.8825   LearningRate 0.0383   Epoch: 7   Global Step: 94680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:40:14,565-Speed 3014.66 samples/sec   Loss 9.1767   LearningRate 0.0383   Epoch: 7   Global Step: 94690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:40:18,046-Speed 2942.84 samples/sec   Loss 9.0083   LearningRate 0.0383   Epoch: 7   Global Step: 94700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:40:21,465-Speed 2995.96 samples/sec   Loss 9.1165   LearningRate 0.0383   Epoch: 7   Global Step: 94710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:40:24,891-Speed 2989.40 samples/sec   Loss 9.1286   LearningRate 0.0383   Epoch: 7   Global Step: 94720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:40:28,217-Speed 3081.64 samples/sec   Loss 8.8765   LearningRate 0.0383   Epoch: 7   Global Step: 94730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:40:31,537-Speed 3085.29 samples/sec   Loss 9.0445   LearningRate 0.0383   Epoch: 7   Global Step: 94740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:40:34,882-Speed 3062.13 samples/sec   Loss 8.9979   LearningRate 0.0383   Epoch: 7   Global Step: 94750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:40:38,391-Speed 2919.23 samples/sec   Loss 9.0918   LearningRate 0.0383   Epoch: 7   Global Step: 94760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:40:41,751-Speed 3048.16 samples/sec   Loss 9.0605   LearningRate 0.0383   Epoch: 7   Global Step: 94770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:40:45,154-Speed 3010.46 samples/sec   Loss 9.1920   LearningRate 0.0383   Epoch: 7   Global Step: 94780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:40:48,523-Speed 3040.40 samples/sec   Loss 9.2203   LearningRate 0.0382   Epoch: 7   Global Step: 94790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:40:51,949-Speed 2989.15 samples/sec   Loss 9.0910   LearningRate 0.0382   Epoch: 7   Global Step: 94800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:40:55,397-Speed 2970.35 samples/sec   Loss 8.9893   LearningRate 0.0382   Epoch: 7   Global Step: 94810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:40:58,881-Speed 2940.82 samples/sec   Loss 8.8941   LearningRate 0.0382   Epoch: 7   Global Step: 94820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:41:02,276-Speed 3016.56 samples/sec   Loss 8.9007   LearningRate 0.0382   Epoch: 7   Global Step: 94830   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:41:05,771-Speed 2930.69 samples/sec   Loss 8.9639   LearningRate 0.0382   Epoch: 7   Global Step: 94840   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:41:09,168-Speed 3015.33 samples/sec   Loss 9.0463   LearningRate 0.0382   Epoch: 7   Global Step: 94850   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:41:12,677-Speed 2918.95 samples/sec   Loss 9.0711   LearningRate 0.0382   Epoch: 7   Global Step: 94860   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:41:16,082-Speed 3008.63 samples/sec   Loss 9.1328   LearningRate 0.0382   Epoch: 7   Global Step: 94870   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:41:19,465-Speed 3027.68 samples/sec   Loss 8.9984   LearningRate 0.0382   Epoch: 7   Global Step: 94880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:41:22,915-Speed 2968.96 samples/sec   Loss 9.0730   LearningRate 0.0382   Epoch: 7   Global Step: 94890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:41:26,329-Speed 3000.04 samples/sec   Loss 9.1526   LearningRate 0.0382   Epoch: 7   Global Step: 94900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:41:29,810-Speed 2943.14 samples/sec   Loss 9.0606   LearningRate 0.0382   Epoch: 7   Global Step: 94910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:41:33,216-Speed 3006.74 samples/sec   Loss 9.1043   LearningRate 0.0382   Epoch: 7   Global Step: 94920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:41:36,714-Speed 2928.58 samples/sec   Loss 9.0251   LearningRate 0.0382   Epoch: 7   Global Step: 94930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:41:40,112-Speed 3014.61 samples/sec   Loss 8.8790   LearningRate 0.0382   Epoch: 7   Global Step: 94940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:41:43,486-Speed 3035.25 samples/sec   Loss 9.0532   LearningRate 0.0382   Epoch: 7   Global Step: 94950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:41:46,870-Speed 3026.83 samples/sec   Loss 9.0556   LearningRate 0.0382   Epoch: 7   Global Step: 94960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:41:50,293-Speed 2992.22 samples/sec   Loss 9.0258   LearningRate 0.0382   Epoch: 7   Global Step: 94970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:41:53,727-Speed 2982.49 samples/sec   Loss 9.0169   LearningRate 0.0382   Epoch: 7   Global Step: 94980   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:41:57,211-Speed 2940.41 samples/sec   Loss 9.1682   LearningRate 0.0381   Epoch: 7   Global Step: 94990   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:42:00,688-Speed 2946.13 samples/sec   Loss 8.9097   LearningRate 0.0381   Epoch: 7   Global Step: 95000   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:42:04,178-Speed 2934.37 samples/sec   Loss 9.0211   LearningRate 0.0381   Epoch: 7   Global Step: 95010   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:42:07,530-Speed 3056.24 samples/sec   Loss 8.9912   LearningRate 0.0381   Epoch: 7   Global Step: 95020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:10,898-Speed 3040.92 samples/sec   Loss 8.9602   LearningRate 0.0381   Epoch: 7   Global Step: 95030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:14,253-Speed 3053.67 samples/sec   Loss 9.1303   LearningRate 0.0381   Epoch: 7   Global Step: 95040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:17,671-Speed 2996.43 samples/sec   Loss 9.0646   LearningRate 0.0381   Epoch: 7   Global Step: 95050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:21,032-Speed 3047.62 samples/sec   Loss 8.9500   LearningRate 0.0381   Epoch: 7   Global Step: 95060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:24,531-Speed 2927.36 samples/sec   Loss 9.1213   LearningRate 0.0381   Epoch: 7   Global Step: 95070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:27,903-Speed 3037.53 samples/sec   Loss 8.9328   LearningRate 0.0381   Epoch: 7   Global Step: 95080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:31,257-Speed 3053.92 samples/sec   Loss 8.9522   LearningRate 0.0381   Epoch: 7   Global Step: 95090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:34,627-Speed 3039.37 samples/sec   Loss 8.9133   LearningRate 0.0381   Epoch: 7   Global Step: 95100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:38,018-Speed 3020.52 samples/sec   Loss 8.9986   LearningRate 0.0381   Epoch: 7   Global Step: 95110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:41,433-Speed 2999.49 samples/sec   Loss 9.0202   LearningRate 0.0381   Epoch: 7   Global Step: 95120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:44,763-Speed 3076.57 samples/sec   Loss 8.9807   LearningRate 0.0381   Epoch: 7   Global Step: 95130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:48,121-Speed 3049.86 samples/sec   Loss 9.0497   LearningRate 0.0381   Epoch: 7   Global Step: 95140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:51,457-Speed 3070.25 samples/sec   Loss 8.9372   LearningRate 0.0381   Epoch: 7   Global Step: 95150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:54,892-Speed 2981.92 samples/sec   Loss 9.0479   LearningRate 0.0381   Epoch: 7   Global Step: 95160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:42:58,253-Speed 3047.30 samples/sec   Loss 9.0341   LearningRate 0.0381   Epoch: 7   Global Step: 95170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:43:01,697-Speed 2973.77 samples/sec   Loss 9.1452   LearningRate 0.0381   Epoch: 7   Global Step: 95180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:43:05,034-Speed 3069.93 samples/sec   Loss 9.0402   LearningRate 0.0380   Epoch: 7   Global Step: 95190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:43:08,360-Speed 3079.87 samples/sec   Loss 8.9677   LearningRate 0.0380   Epoch: 7   Global Step: 95200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:43:11,705-Speed 3061.51 samples/sec   Loss 8.9601   LearningRate 0.0380   Epoch: 7   Global Step: 95210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:43:15,052-Speed 3060.44 samples/sec   Loss 9.1013   LearningRate 0.0380   Epoch: 7   Global Step: 95220   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:43:18,433-Speed 3029.88 samples/sec   Loss 9.0844   LearningRate 0.0380   Epoch: 7   Global Step: 95230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:43:21,775-Speed 3064.76 samples/sec   Loss 9.0727   LearningRate 0.0380   Epoch: 7   Global Step: 95240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:43:25,087-Speed 3092.10 samples/sec   Loss 8.9377   LearningRate 0.0380   Epoch: 7   Global Step: 95250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:43:28,410-Speed 3083.14 samples/sec   Loss 9.0534   LearningRate 0.0380   Epoch: 7   Global Step: 95260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:43:31,797-Speed 3023.41 samples/sec   Loss 8.8795   LearningRate 0.0380   Epoch: 7   Global Step: 95270   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 10:43:35,200-Speed 3010.35 samples/sec   Loss 9.0790   LearningRate 0.0380   Epoch: 7   Global Step: 95280   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 10:43:38,626-Speed 2989.14 samples/sec   Loss 9.0168   LearningRate 0.0380   Epoch: 7   Global Step: 95290   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 10:43:42,081-Speed 2965.00 samples/sec   Loss 8.9134   LearningRate 0.0380   Epoch: 7   Global Step: 95300   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 10:43:45,447-Speed 3042.98 samples/sec   Loss 8.9751   LearningRate 0.0380   Epoch: 7   Global Step: 95310   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 10:43:48,922-Speed 2947.49 samples/sec   Loss 9.0108   LearningRate 0.0380   Epoch: 7   Global Step: 95320   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 10:43:52,356-Speed 2982.81 samples/sec   Loss 9.0234   LearningRate 0.0380   Epoch: 7   Global Step: 95330   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 10:43:55,732-Speed 3033.72 samples/sec   Loss 8.9598   LearningRate 0.0380   Epoch: 7   Global Step: 95340   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 10:43:59,057-Speed 3080.07 samples/sec   Loss 9.0239   LearningRate 0.0380   Epoch: 7   Global Step: 95350   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 10:44:02,466-Speed 3004.78 samples/sec   Loss 9.0595   LearningRate 0.0380   Epoch: 7   Global Step: 95360   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 10:44:05,890-Speed 2991.55 samples/sec   Loss 8.9803   LearningRate 0.0380   Epoch: 7   Global Step: 95370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:44:09,235-Speed 3061.67 samples/sec   Loss 9.0005   LearningRate 0.0380   Epoch: 7   Global Step: 95380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:44:12,697-Speed 2958.80 samples/sec   Loss 9.0770   LearningRate 0.0379   Epoch: 7   Global Step: 95390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:44:16,042-Speed 3062.41 samples/sec   Loss 9.0644   LearningRate 0.0379   Epoch: 7   Global Step: 95400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:44:19,468-Speed 2989.50 samples/sec   Loss 9.0899   LearningRate 0.0379   Epoch: 7   Global Step: 95410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:44:22,869-Speed 3011.30 samples/sec   Loss 9.0234   LearningRate 0.0379   Epoch: 7   Global Step: 95420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:44:26,212-Speed 3063.92 samples/sec   Loss 8.9058   LearningRate 0.0379   Epoch: 7   Global Step: 95430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:44:29,592-Speed 3030.37 samples/sec   Loss 8.9857   LearningRate 0.0379   Epoch: 7   Global Step: 95440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:44:33,049-Speed 2963.12 samples/sec   Loss 8.9548   LearningRate 0.0379   Epoch: 7   Global Step: 95450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:44:36,504-Speed 2964.60 samples/sec   Loss 9.0498   LearningRate 0.0379   Epoch: 7   Global Step: 95460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:44:39,863-Speed 3049.52 samples/sec   Loss 9.1856   LearningRate 0.0379   Epoch: 7   Global Step: 95470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:44:43,290-Speed 2988.70 samples/sec   Loss 8.9760   LearningRate 0.0379   Epoch: 7   Global Step: 95480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:44:46,692-Speed 3011.14 samples/sec   Loss 9.1386   LearningRate 0.0379   Epoch: 7   Global Step: 95490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:44:50,144-Speed 2966.81 samples/sec   Loss 8.9290   LearningRate 0.0379   Epoch: 7   Global Step: 95500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:44:53,569-Speed 2991.25 samples/sec   Loss 9.0613   LearningRate 0.0379   Epoch: 7   Global Step: 95510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:44:56,941-Speed 3037.34 samples/sec   Loss 9.0229   LearningRate 0.0379   Epoch: 7   Global Step: 95520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:45:00,352-Speed 3003.25 samples/sec   Loss 9.0167   LearningRate 0.0379   Epoch: 7   Global Step: 95530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:45:03,803-Speed 2967.99 samples/sec   Loss 8.9183   LearningRate 0.0379   Epoch: 7   Global Step: 95540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:45:07,305-Speed 2924.66 samples/sec   Loss 8.9986   LearningRate 0.0379   Epoch: 7   Global Step: 95550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:45:10,747-Speed 2976.06 samples/sec   Loss 8.9932   LearningRate 0.0379   Epoch: 7   Global Step: 95560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:45:14,107-Speed 3048.64 samples/sec   Loss 9.0863   LearningRate 0.0379   Epoch: 7   Global Step: 95570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:45:17,409-Speed 3102.69 samples/sec   Loss 9.1024   LearningRate 0.0379   Epoch: 7   Global Step: 95580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:45:20,864-Speed 2965.36 samples/sec   Loss 8.9847   LearningRate 0.0378   Epoch: 7   Global Step: 95590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:45:24,282-Speed 2996.31 samples/sec   Loss 9.0416   LearningRate 0.0378   Epoch: 7   Global Step: 95600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:45:27,642-Speed 3048.05 samples/sec   Loss 8.8679   LearningRate 0.0378   Epoch: 7   Global Step: 95610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:45:31,114-Speed 2950.41 samples/sec   Loss 8.9841   LearningRate 0.0378   Epoch: 7   Global Step: 95620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:45:34,532-Speed 2996.56 samples/sec   Loss 8.9684   LearningRate 0.0378   Epoch: 7   Global Step: 95630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:45:37,934-Speed 3010.96 samples/sec   Loss 8.9954   LearningRate 0.0378   Epoch: 7   Global Step: 95640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:45:41,389-Speed 2964.43 samples/sec   Loss 8.9827   LearningRate 0.0378   Epoch: 7   Global Step: 95650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:45:44,807-Speed 2996.39 samples/sec   Loss 8.9527   LearningRate 0.0378   Epoch: 7   Global Step: 95660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:45:48,186-Speed 3031.70 samples/sec   Loss 8.9224   LearningRate 0.0378   Epoch: 7   Global Step: 95670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:45:51,694-Speed 2920.05 samples/sec   Loss 8.9731   LearningRate 0.0378   Epoch: 7   Global Step: 95680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:45:55,078-Speed 3026.12 samples/sec   Loss 9.0709   LearningRate 0.0378   Epoch: 7   Global Step: 95690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:45:58,555-Speed 2945.86 samples/sec   Loss 8.8825   LearningRate 0.0378   Epoch: 7   Global Step: 95700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:46:01,931-Speed 3034.74 samples/sec   Loss 8.9375   LearningRate 0.0378   Epoch: 7   Global Step: 95710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:46:05,305-Speed 3035.61 samples/sec   Loss 8.8990   LearningRate 0.0378   Epoch: 7   Global Step: 95720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:46:08,664-Speed 3049.01 samples/sec   Loss 8.9831   LearningRate 0.0378   Epoch: 7   Global Step: 95730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:46:12,049-Speed 3026.15 samples/sec   Loss 9.0505   LearningRate 0.0378   Epoch: 7   Global Step: 95740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:46:15,498-Speed 2969.72 samples/sec   Loss 9.0207   LearningRate 0.0378   Epoch: 7   Global Step: 95750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:46:18,951-Speed 2966.35 samples/sec   Loss 9.0116   LearningRate 0.0378   Epoch: 7   Global Step: 95760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:46:22,287-Speed 3070.07 samples/sec   Loss 8.8965   LearningRate 0.0378   Epoch: 7   Global Step: 95770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:46:25,701-Speed 3000.02 samples/sec   Loss 8.9117   LearningRate 0.0378   Epoch: 7   Global Step: 95780   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:46:29,163-Speed 2958.85 samples/sec   Loss 9.0363   LearningRate 0.0377   Epoch: 7   Global Step: 95790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:46:32,609-Speed 2972.67 samples/sec   Loss 8.9679   LearningRate 0.0377   Epoch: 7   Global Step: 95800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:46:36,061-Speed 2966.92 samples/sec   Loss 9.0483   LearningRate 0.0377   Epoch: 7   Global Step: 95810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:46:39,536-Speed 2948.14 samples/sec   Loss 8.9975   LearningRate 0.0377   Epoch: 7   Global Step: 95820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:46:42,868-Speed 3074.38 samples/sec   Loss 8.9927   LearningRate 0.0377   Epoch: 7   Global Step: 95830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:46:46,270-Speed 3010.45 samples/sec   Loss 8.8690   LearningRate 0.0377   Epoch: 7   Global Step: 95840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:46:49,611-Speed 3066.28 samples/sec   Loss 8.7889   LearningRate 0.0377   Epoch: 7   Global Step: 95850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:46:52,953-Speed 3064.74 samples/sec   Loss 8.9345   LearningRate 0.0377   Epoch: 7   Global Step: 95860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:46:56,334-Speed 3029.09 samples/sec   Loss 8.8863   LearningRate 0.0377   Epoch: 7   Global Step: 95870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:46:59,763-Speed 2987.64 samples/sec   Loss 8.8639   LearningRate 0.0377   Epoch: 7   Global Step: 95880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:47:03,178-Speed 2999.03 samples/sec   Loss 8.9642   LearningRate 0.0377   Epoch: 7   Global Step: 95890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:47:06,514-Speed 3070.43 samples/sec   Loss 9.0393   LearningRate 0.0377   Epoch: 7   Global Step: 95900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:47:09,912-Speed 3013.96 samples/sec   Loss 8.9411   LearningRate 0.0377   Epoch: 7   Global Step: 95910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:47:13,371-Speed 2961.71 samples/sec   Loss 8.8616   LearningRate 0.0377   Epoch: 7   Global Step: 95920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:47:16,733-Speed 3046.47 samples/sec   Loss 9.0077   LearningRate 0.0377   Epoch: 7   Global Step: 95930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:47:20,098-Speed 3043.54 samples/sec   Loss 8.9670   LearningRate 0.0377   Epoch: 7   Global Step: 95940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:47:23,426-Speed 3078.12 samples/sec   Loss 9.0444   LearningRate 0.0377   Epoch: 7   Global Step: 95950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:47:26,758-Speed 3074.06 samples/sec   Loss 9.0666   LearningRate 0.0377   Epoch: 7   Global Step: 95960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:47:30,126-Speed 3040.99 samples/sec   Loss 9.0383   LearningRate 0.0377   Epoch: 7   Global Step: 95970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:47:33,477-Speed 3056.73 samples/sec   Loss 8.9632   LearningRate 0.0377   Epoch: 7   Global Step: 95980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:47:36,817-Speed 3066.47 samples/sec   Loss 8.7649   LearningRate 0.0377   Epoch: 7   Global Step: 95990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:47:40,145-Speed 3078.19 samples/sec   Loss 9.0547   LearningRate 0.0376   Epoch: 7   Global Step: 96000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:47:43,618-Speed 2949.05 samples/sec   Loss 8.9745   LearningRate 0.0376   Epoch: 7   Global Step: 96010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:47:47,041-Speed 2991.99 samples/sec   Loss 8.9052   LearningRate 0.0376   Epoch: 7   Global Step: 96020   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:47:50,382-Speed 3066.29 samples/sec   Loss 8.9863   LearningRate 0.0376   Epoch: 7   Global Step: 96030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:47:53,734-Speed 3055.93 samples/sec   Loss 9.0658   LearningRate 0.0376   Epoch: 7   Global Step: 96040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:47:57,088-Speed 3053.36 samples/sec   Loss 9.0012   LearningRate 0.0376   Epoch: 7   Global Step: 96050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:48:00,435-Speed 3060.61 samples/sec   Loss 8.8561   LearningRate 0.0376   Epoch: 7   Global Step: 96060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:48:03,797-Speed 3046.67 samples/sec   Loss 8.9812   LearningRate 0.0376   Epoch: 7   Global Step: 96070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:48:07,151-Speed 3053.72 samples/sec   Loss 8.8853   LearningRate 0.0376   Epoch: 7   Global Step: 96080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:48:10,494-Speed 3064.01 samples/sec   Loss 8.9358   LearningRate 0.0376   Epoch: 7   Global Step: 96090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:48:13,795-Speed 3103.68 samples/sec   Loss 8.9579   LearningRate 0.0376   Epoch: 7   Global Step: 96100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:48:17,221-Speed 2989.22 samples/sec   Loss 8.9575   LearningRate 0.0376   Epoch: 7   Global Step: 96110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:48:20,627-Speed 3008.24 samples/sec   Loss 8.9976   LearningRate 0.0376   Epoch: 7   Global Step: 96120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:48:24,057-Speed 2986.22 samples/sec   Loss 8.8650   LearningRate 0.0376   Epoch: 7   Global Step: 96130   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:48:27,516-Speed 2960.48 samples/sec   Loss 8.7413   LearningRate 0.0376   Epoch: 7   Global Step: 96140   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:48:30,904-Speed 3023.29 samples/sec   Loss 8.8684   LearningRate 0.0376   Epoch: 7   Global Step: 96150   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:48:34,355-Speed 2968.93 samples/sec   Loss 8.9019   LearningRate 0.0376   Epoch: 7   Global Step: 96160   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:48:37,729-Speed 3035.26 samples/sec   Loss 8.9043   LearningRate 0.0376   Epoch: 7   Global Step: 96170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:48:41,073-Speed 3063.84 samples/sec   Loss 8.9802   LearningRate 0.0376   Epoch: 7   Global Step: 96180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:48:44,429-Speed 3051.50 samples/sec   Loss 8.8697   LearningRate 0.0376   Epoch: 7   Global Step: 96190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:48:47,755-Speed 3079.60 samples/sec   Loss 8.9479   LearningRate 0.0375   Epoch: 7   Global Step: 96200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:48:51,164-Speed 3004.80 samples/sec   Loss 8.9505   LearningRate 0.0375   Epoch: 7   Global Step: 96210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:48:54,502-Speed 3069.16 samples/sec   Loss 8.9376   LearningRate 0.0375   Epoch: 7   Global Step: 96220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:48:57,910-Speed 3005.37 samples/sec   Loss 8.9089   LearningRate 0.0375   Epoch: 7   Global Step: 96230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:49:01,312-Speed 3010.95 samples/sec   Loss 8.9416   LearningRate 0.0375   Epoch: 7   Global Step: 96240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:49:04,704-Speed 3019.89 samples/sec   Loss 9.0191   LearningRate 0.0375   Epoch: 7   Global Step: 96250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:49:08,113-Speed 3004.70 samples/sec   Loss 8.8243   LearningRate 0.0375   Epoch: 7   Global Step: 96260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:49:11,516-Speed 3010.49 samples/sec   Loss 8.9309   LearningRate 0.0375   Epoch: 7   Global Step: 96270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:49:14,890-Speed 3035.42 samples/sec   Loss 8.9723   LearningRate 0.0375   Epoch: 7   Global Step: 96280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:49:18,284-Speed 3018.51 samples/sec   Loss 9.0280   LearningRate 0.0375   Epoch: 7   Global Step: 96290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:49:21,707-Speed 2991.64 samples/sec   Loss 8.8888   LearningRate 0.0375   Epoch: 7   Global Step: 96300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:49:25,136-Speed 2987.73 samples/sec   Loss 9.0155   LearningRate 0.0375   Epoch: 7   Global Step: 96310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:49:28,495-Speed 3049.06 samples/sec   Loss 8.9860   LearningRate 0.0375   Epoch: 7   Global Step: 96320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:49:31,873-Speed 3032.22 samples/sec   Loss 8.9256   LearningRate 0.0375   Epoch: 7   Global Step: 96330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:49:35,269-Speed 3016.54 samples/sec   Loss 8.9875   LearningRate 0.0375   Epoch: 7   Global Step: 96340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:49:38,699-Speed 2985.74 samples/sec   Loss 8.8903   LearningRate 0.0375   Epoch: 7   Global Step: 96350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:49:42,081-Speed 3029.33 samples/sec   Loss 8.8658   LearningRate 0.0375   Epoch: 7   Global Step: 96360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:49:45,426-Speed 3061.81 samples/sec   Loss 8.9962   LearningRate 0.0375   Epoch: 7   Global Step: 96370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:49:48,828-Speed 3010.99 samples/sec   Loss 8.9236   LearningRate 0.0375   Epoch: 7   Global Step: 96380   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:49:52,241-Speed 3000.05 samples/sec   Loss 8.9304   LearningRate 0.0375   Epoch: 7   Global Step: 96390   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:49:55,645-Speed 3009.50 samples/sec   Loss 8.9523   LearningRate 0.0374   Epoch: 7   Global Step: 96400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:49:59,076-Speed 2984.95 samples/sec   Loss 8.9291   LearningRate 0.0374   Epoch: 7   Global Step: 96410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:50:02,451-Speed 3034.85 samples/sec   Loss 8.8429   LearningRate 0.0374   Epoch: 7   Global Step: 96420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:50:05,925-Speed 2948.81 samples/sec   Loss 8.8831   LearningRate 0.0374   Epoch: 7   Global Step: 96430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:50:09,300-Speed 3035.28 samples/sec   Loss 8.7889   LearningRate 0.0374   Epoch: 7   Global Step: 96440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:50:12,659-Speed 3049.07 samples/sec   Loss 8.9814   LearningRate 0.0374   Epoch: 7   Global Step: 96450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:50:16,002-Speed 3064.11 samples/sec   Loss 8.9721   LearningRate 0.0374   Epoch: 7   Global Step: 96460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:50:19,355-Speed 3055.06 samples/sec   Loss 8.7968   LearningRate 0.0374   Epoch: 7   Global Step: 96470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:50:22,777-Speed 2992.96 samples/sec   Loss 8.9173   LearningRate 0.0374   Epoch: 7   Global Step: 96480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:50:26,145-Speed 3041.47 samples/sec   Loss 8.8748   LearningRate 0.0374   Epoch: 7   Global Step: 96490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:50:29,473-Speed 3077.31 samples/sec   Loss 9.0710   LearningRate 0.0374   Epoch: 7   Global Step: 96500   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:50:32,790-Speed 3088.96 samples/sec   Loss 8.8966   LearningRate 0.0374   Epoch: 7   Global Step: 96510   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:50:36,119-Speed 3076.64 samples/sec   Loss 8.8456   LearningRate 0.0374   Epoch: 7   Global Step: 96520   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:50:39,458-Speed 3067.83 samples/sec   Loss 9.0250   LearningRate 0.0374   Epoch: 7   Global Step: 96530   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:50:42,785-Speed 3079.18 samples/sec   Loss 8.9951   LearningRate 0.0374   Epoch: 7   Global Step: 96540   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:50:46,120-Speed 3070.75 samples/sec   Loss 8.8747   LearningRate 0.0374   Epoch: 7   Global Step: 96550   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:50:49,435-Speed 3090.02 samples/sec   Loss 8.9655   LearningRate 0.0374   Epoch: 7   Global Step: 96560   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:50:52,832-Speed 3015.48 samples/sec   Loss 8.9548   LearningRate 0.0374   Epoch: 7   Global Step: 96570   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:50:56,249-Speed 2997.89 samples/sec   Loss 8.9890   LearningRate 0.0374   Epoch: 7   Global Step: 96580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:50:59,680-Speed 2984.99 samples/sec   Loss 8.9446   LearningRate 0.0374   Epoch: 7   Global Step: 96590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:03,197-Speed 2912.92 samples/sec   Loss 8.9811   LearningRate 0.0373   Epoch: 7   Global Step: 96600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:06,630-Speed 2983.05 samples/sec   Loss 8.9563   LearningRate 0.0373   Epoch: 7   Global Step: 96610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:10,007-Speed 3033.30 samples/sec   Loss 8.7699   LearningRate 0.0373   Epoch: 7   Global Step: 96620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:13,428-Speed 2994.13 samples/sec   Loss 8.9158   LearningRate 0.0373   Epoch: 7   Global Step: 96630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:16,817-Speed 3021.72 samples/sec   Loss 8.8886   LearningRate 0.0373   Epoch: 7   Global Step: 96640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:20,231-Speed 2999.97 samples/sec   Loss 8.8094   LearningRate 0.0373   Epoch: 7   Global Step: 96650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:23,574-Speed 3064.24 samples/sec   Loss 9.0598   LearningRate 0.0373   Epoch: 7   Global Step: 96660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:26,945-Speed 3038.58 samples/sec   Loss 8.8583   LearningRate 0.0373   Epoch: 7   Global Step: 96670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:30,440-Speed 2930.78 samples/sec   Loss 8.9227   LearningRate 0.0373   Epoch: 7   Global Step: 96680   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:51:33,895-Speed 2964.54 samples/sec   Loss 8.9018   LearningRate 0.0373   Epoch: 7   Global Step: 96690   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:51:37,278-Speed 3028.11 samples/sec   Loss 8.7476   LearningRate 0.0373   Epoch: 7   Global Step: 96700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:40,623-Speed 3062.26 samples/sec   Loss 8.7290   LearningRate 0.0373   Epoch: 7   Global Step: 96710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:44,010-Speed 3023.57 samples/sec   Loss 8.9035   LearningRate 0.0373   Epoch: 7   Global Step: 96720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:47,412-Speed 3010.88 samples/sec   Loss 8.9672   LearningRate 0.0373   Epoch: 7   Global Step: 96730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:50,818-Speed 3007.13 samples/sec   Loss 8.8593   LearningRate 0.0373   Epoch: 7   Global Step: 96740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:54,191-Speed 3037.00 samples/sec   Loss 8.7474   LearningRate 0.0373   Epoch: 7   Global Step: 96750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:51:57,570-Speed 3030.93 samples/sec   Loss 8.8330   LearningRate 0.0373   Epoch: 7   Global Step: 96760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:52:00,874-Speed 3100.26 samples/sec   Loss 8.9399   LearningRate 0.0373   Epoch: 7   Global Step: 96770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:52:04,315-Speed 2976.71 samples/sec   Loss 8.9105   LearningRate 0.0373   Epoch: 7   Global Step: 96780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:52:07,763-Speed 2970.82 samples/sec   Loss 8.9022   LearningRate 0.0373   Epoch: 7   Global Step: 96790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:52:11,228-Speed 2956.63 samples/sec   Loss 8.9744   LearningRate 0.0373   Epoch: 7   Global Step: 96800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:52:14,565-Speed 3069.60 samples/sec   Loss 8.9629   LearningRate 0.0372   Epoch: 7   Global Step: 96810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:52:18,019-Speed 2965.38 samples/sec   Loss 9.1618   LearningRate 0.0372   Epoch: 7   Global Step: 96820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:52:21,475-Speed 2963.63 samples/sec   Loss 8.9659   LearningRate 0.0372   Epoch: 7   Global Step: 96830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:52:24,837-Speed 3048.53 samples/sec   Loss 8.8015   LearningRate 0.0372   Epoch: 7   Global Step: 96840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:52:28,310-Speed 2949.53 samples/sec   Loss 8.9325   LearningRate 0.0372   Epoch: 7   Global Step: 96850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:52:31,768-Speed 2961.58 samples/sec   Loss 8.9026   LearningRate 0.0372   Epoch: 7   Global Step: 96860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:52:35,080-Speed 3092.50 samples/sec   Loss 8.8764   LearningRate 0.0372   Epoch: 7   Global Step: 96870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:52:38,478-Speed 3015.11 samples/sec   Loss 8.8995   LearningRate 0.0372   Epoch: 7   Global Step: 96880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:52:41,850-Speed 3036.75 samples/sec   Loss 9.0692   LearningRate 0.0372   Epoch: 7   Global Step: 96890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:52:45,232-Speed 3029.05 samples/sec   Loss 8.6988   LearningRate 0.0372   Epoch: 7   Global Step: 96900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:52:48,632-Speed 3012.89 samples/sec   Loss 9.0782   LearningRate 0.0372   Epoch: 7   Global Step: 96910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:52:51,987-Speed 3053.02 samples/sec   Loss 8.8400   LearningRate 0.0372   Epoch: 7   Global Step: 96920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:52:55,376-Speed 3021.92 samples/sec   Loss 8.7990   LearningRate 0.0372   Epoch: 7   Global Step: 96930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:52:58,850-Speed 2948.98 samples/sec   Loss 9.0187   LearningRate 0.0372   Epoch: 7   Global Step: 96940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:53:02,209-Speed 3049.06 samples/sec   Loss 8.8986   LearningRate 0.0372   Epoch: 7   Global Step: 96950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:53:05,555-Speed 3060.89 samples/sec   Loss 8.9260   LearningRate 0.0372   Epoch: 7   Global Step: 96960   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:53:08,994-Speed 2978.06 samples/sec   Loss 8.9248   LearningRate 0.0372   Epoch: 7   Global Step: 96970   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:53:12,345-Speed 3056.67 samples/sec   Loss 8.8723   LearningRate 0.0372   Epoch: 7   Global Step: 96980   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:53:15,719-Speed 3035.42 samples/sec   Loss 8.8674   LearningRate 0.0372   Epoch: 7   Global Step: 96990   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:53:19,177-Speed 2962.72 samples/sec   Loss 9.0461   LearningRate 0.0372   Epoch: 7   Global Step: 97000   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:53:22,607-Speed 2985.74 samples/sec   Loss 8.8007   LearningRate 0.0371   Epoch: 7   Global Step: 97010   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:53:25,984-Speed 3033.18 samples/sec   Loss 8.8801   LearningRate 0.0371   Epoch: 7   Global Step: 97020   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:53:29,436-Speed 2967.16 samples/sec   Loss 9.0077   LearningRate 0.0371   Epoch: 7   Global Step: 97030   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:53:32,842-Speed 3007.25 samples/sec   Loss 8.9414   LearningRate 0.0371   Epoch: 7   Global Step: 97040   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:53:36,270-Speed 2988.42 samples/sec   Loss 8.8223   LearningRate 0.0371   Epoch: 7   Global Step: 97050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:53:39,689-Speed 2995.38 samples/sec   Loss 8.8359   LearningRate 0.0371   Epoch: 7   Global Step: 97060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:53:43,097-Speed 3005.50 samples/sec   Loss 8.8880   LearningRate 0.0371   Epoch: 7   Global Step: 97070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:53:46,478-Speed 3029.38 samples/sec   Loss 8.8359   LearningRate 0.0371   Epoch: 7   Global Step: 97080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:53:49,891-Speed 3001.36 samples/sec   Loss 8.9863   LearningRate 0.0371   Epoch: 7   Global Step: 97090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:53:53,278-Speed 3024.28 samples/sec   Loss 8.9032   LearningRate 0.0371   Epoch: 7   Global Step: 97100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:53:56,689-Speed 3002.79 samples/sec   Loss 8.8898   LearningRate 0.0371   Epoch: 7   Global Step: 97110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:54:00,124-Speed 2982.16 samples/sec   Loss 8.9886   LearningRate 0.0371   Epoch: 7   Global Step: 97120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:54:03,466-Speed 3064.22 samples/sec   Loss 8.7600   LearningRate 0.0371   Epoch: 7   Global Step: 97130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:54:06,818-Speed 3056.05 samples/sec   Loss 8.9728   LearningRate 0.0371   Epoch: 7   Global Step: 97140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:54:10,229-Speed 3002.26 samples/sec   Loss 8.8138   LearningRate 0.0371   Epoch: 7   Global Step: 97150   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:54:13,641-Speed 3002.26 samples/sec   Loss 8.7155   LearningRate 0.0371   Epoch: 7   Global Step: 97160   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:54:17,001-Speed 3048.74 samples/sec   Loss 8.9763   LearningRate 0.0371   Epoch: 7   Global Step: 97170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:54:20,390-Speed 3022.36 samples/sec   Loss 8.7984   LearningRate 0.0371   Epoch: 7   Global Step: 97180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:54:23,767-Speed 3032.75 samples/sec   Loss 8.8154   LearningRate 0.0371   Epoch: 7   Global Step: 97190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:54:27,097-Speed 3075.58 samples/sec   Loss 8.8659   LearningRate 0.0371   Epoch: 7   Global Step: 97200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:54:30,495-Speed 3014.80 samples/sec   Loss 8.8808   LearningRate 0.0370   Epoch: 7   Global Step: 97210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:54:33,844-Speed 3057.81 samples/sec   Loss 8.8975   LearningRate 0.0370   Epoch: 7   Global Step: 97220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:54:37,293-Speed 2970.00 samples/sec   Loss 8.8123   LearningRate 0.0370   Epoch: 7   Global Step: 97230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:54:40,763-Speed 2951.52 samples/sec   Loss 8.9299   LearningRate 0.0370   Epoch: 7   Global Step: 97240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:54:44,148-Speed 3026.73 samples/sec   Loss 8.7788   LearningRate 0.0370   Epoch: 7   Global Step: 97250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:54:47,562-Speed 3001.01 samples/sec   Loss 8.9946   LearningRate 0.0370   Epoch: 7   Global Step: 97260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:54:50,891-Speed 3077.06 samples/sec   Loss 8.9142   LearningRate 0.0370   Epoch: 7   Global Step: 97270   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:54:54,287-Speed 3015.63 samples/sec   Loss 8.8197   LearningRate 0.0370   Epoch: 7   Global Step: 97280   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:54:57,702-Speed 2999.80 samples/sec   Loss 9.0053   LearningRate 0.0370   Epoch: 7   Global Step: 97290   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:55:01,194-Speed 2934.31 samples/sec   Loss 8.8058   LearningRate 0.0370   Epoch: 7   Global Step: 97300   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:55:04,565-Speed 3038.20 samples/sec   Loss 8.8742   LearningRate 0.0370   Epoch: 7   Global Step: 97310   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:55:07,892-Speed 3078.05 samples/sec   Loss 8.8328   LearningRate 0.0370   Epoch: 7   Global Step: 97320   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:55:11,245-Speed 3054.67 samples/sec   Loss 8.7775   LearningRate 0.0370   Epoch: 7   Global Step: 97330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:55:14,666-Speed 2994.28 samples/sec   Loss 8.6417   LearningRate 0.0370   Epoch: 7   Global Step: 97340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:55:18,044-Speed 3032.54 samples/sec   Loss 8.7883   LearningRate 0.0370   Epoch: 7   Global Step: 97350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:55:21,400-Speed 3051.64 samples/sec   Loss 8.9307   LearningRate 0.0370   Epoch: 7   Global Step: 97360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:55:24,837-Speed 2981.00 samples/sec   Loss 8.8745   LearningRate 0.0370   Epoch: 7   Global Step: 97370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:55:28,237-Speed 3012.01 samples/sec   Loss 8.7870   LearningRate 0.0370   Epoch: 7   Global Step: 97380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:55:31,609-Speed 3038.05 samples/sec   Loss 9.0396   LearningRate 0.0370   Epoch: 7   Global Step: 97390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:55:35,047-Speed 2978.75 samples/sec   Loss 8.8062   LearningRate 0.0370   Epoch: 7   Global Step: 97400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:55:38,503-Speed 2963.88 samples/sec   Loss 8.9212   LearningRate 0.0370   Epoch: 7   Global Step: 97410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:55:41,928-Speed 2991.04 samples/sec   Loss 8.9011   LearningRate 0.0369   Epoch: 7   Global Step: 97420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:55:45,322-Speed 3018.26 samples/sec   Loss 8.9534   LearningRate 0.0369   Epoch: 7   Global Step: 97430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:55:48,697-Speed 3035.26 samples/sec   Loss 8.9385   LearningRate 0.0369   Epoch: 7   Global Step: 97440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:55:52,041-Speed 3064.00 samples/sec   Loss 8.9726   LearningRate 0.0369   Epoch: 7   Global Step: 97450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:55:55,395-Speed 3053.07 samples/sec   Loss 8.7924   LearningRate 0.0369   Epoch: 7   Global Step: 97460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:55:58,779-Speed 3027.40 samples/sec   Loss 8.8835   LearningRate 0.0369   Epoch: 7   Global Step: 97470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:56:02,175-Speed 3016.51 samples/sec   Loss 8.8698   LearningRate 0.0369   Epoch: 7   Global Step: 97480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:56:05,518-Speed 3064.04 samples/sec   Loss 8.8000   LearningRate 0.0369   Epoch: 7   Global Step: 97490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:56:08,851-Speed 3073.15 samples/sec   Loss 8.9321   LearningRate 0.0369   Epoch: 7   Global Step: 97500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:56:12,194-Speed 3064.00 samples/sec   Loss 8.9502   LearningRate 0.0369   Epoch: 7   Global Step: 97510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:56:15,574-Speed 3029.76 samples/sec   Loss 8.9073   LearningRate 0.0369   Epoch: 7   Global Step: 97520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:56:18,933-Speed 3049.66 samples/sec   Loss 8.8988   LearningRate 0.0369   Epoch: 7   Global Step: 97530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:56:22,279-Speed 3061.80 samples/sec   Loss 8.8264   LearningRate 0.0369   Epoch: 7   Global Step: 97540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:56:25,598-Speed 3085.25 samples/sec   Loss 8.8081   LearningRate 0.0369   Epoch: 7   Global Step: 97550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:56:29,041-Speed 2975.03 samples/sec   Loss 8.9016   LearningRate 0.0369   Epoch: 7   Global Step: 97560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:56:32,402-Speed 3047.96 samples/sec   Loss 9.0321   LearningRate 0.0369   Epoch: 7   Global Step: 97570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:56:35,755-Speed 3054.25 samples/sec   Loss 8.8818   LearningRate 0.0369   Epoch: 7   Global Step: 97580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:56:39,169-Speed 3000.45 samples/sec   Loss 8.8898   LearningRate 0.0369   Epoch: 7   Global Step: 97590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:56:42,603-Speed 2982.96 samples/sec   Loss 8.7834   LearningRate 0.0369   Epoch: 7   Global Step: 97600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:56:45,942-Speed 3067.56 samples/sec   Loss 8.6872   LearningRate 0.0369   Epoch: 7   Global Step: 97610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:56:49,324-Speed 3028.13 samples/sec   Loss 8.9714   LearningRate 0.0368   Epoch: 7   Global Step: 97620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:56:52,780-Speed 2964.20 samples/sec   Loss 8.8118   LearningRate 0.0368   Epoch: 7   Global Step: 97630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:56:56,110-Speed 3075.91 samples/sec   Loss 8.8465   LearningRate 0.0368   Epoch: 7   Global Step: 97640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:56:59,485-Speed 3035.07 samples/sec   Loss 8.8060   LearningRate 0.0368   Epoch: 7   Global Step: 97650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:57:02,817-Speed 3074.34 samples/sec   Loss 8.8788   LearningRate 0.0368   Epoch: 7   Global Step: 97660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:57:06,174-Speed 3050.94 samples/sec   Loss 8.8354   LearningRate 0.0368   Epoch: 7   Global Step: 97670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:57:09,608-Speed 2982.58 samples/sec   Loss 8.8231   LearningRate 0.0368   Epoch: 7   Global Step: 97680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:57:13,039-Speed 2985.41 samples/sec   Loss 8.9274   LearningRate 0.0368   Epoch: 7   Global Step: 97690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:57:16,487-Speed 2970.88 samples/sec   Loss 8.8245   LearningRate 0.0368   Epoch: 7   Global Step: 97700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:57:19,833-Speed 3061.26 samples/sec   Loss 8.6798   LearningRate 0.0368   Epoch: 7   Global Step: 97710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:57:23,338-Speed 2922.18 samples/sec   Loss 8.8842   LearningRate 0.0368   Epoch: 7   Global Step: 97720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:57:26,736-Speed 3014.26 samples/sec   Loss 8.7079   LearningRate 0.0368   Epoch: 7   Global Step: 97730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:57:30,132-Speed 3016.22 samples/sec   Loss 8.7655   LearningRate 0.0368   Epoch: 7   Global Step: 97740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:57:33,515-Speed 3027.99 samples/sec   Loss 8.9700   LearningRate 0.0368   Epoch: 7   Global Step: 97750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:57:36,936-Speed 2994.17 samples/sec   Loss 8.8206   LearningRate 0.0368   Epoch: 7   Global Step: 97760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:57:40,266-Speed 3076.47 samples/sec   Loss 8.6962   LearningRate 0.0368   Epoch: 7   Global Step: 97770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:57:43,628-Speed 3046.70 samples/sec   Loss 8.7418   LearningRate 0.0368   Epoch: 7   Global Step: 97780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:57:47,017-Speed 3022.85 samples/sec   Loss 8.8790   LearningRate 0.0368   Epoch: 7   Global Step: 97790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 10:57:50,346-Speed 3076.99 samples/sec   Loss 8.6843   LearningRate 0.0368   Epoch: 7   Global Step: 97800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:57:53,781-Speed 2981.80 samples/sec   Loss 8.8219   LearningRate 0.0368   Epoch: 7   Global Step: 97810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:57:57,207-Speed 2989.74 samples/sec   Loss 8.8333   LearningRate 0.0368   Epoch: 7   Global Step: 97820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:58:00,565-Speed 3050.81 samples/sec   Loss 8.8749   LearningRate 0.0367   Epoch: 7   Global Step: 97830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:58:03,937-Speed 3037.83 samples/sec   Loss 8.7909   LearningRate 0.0367   Epoch: 7   Global Step: 97840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:58:07,367-Speed 2985.46 samples/sec   Loss 8.8529   LearningRate 0.0367   Epoch: 7   Global Step: 97850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:58:10,750-Speed 3028.17 samples/sec   Loss 8.6793   LearningRate 0.0367   Epoch: 7   Global Step: 97860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:58:14,193-Speed 2975.23 samples/sec   Loss 8.8471   LearningRate 0.0367   Epoch: 7   Global Step: 97870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:58:17,552-Speed 3049.06 samples/sec   Loss 8.7698   LearningRate 0.0367   Epoch: 7   Global Step: 97880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:58:20,891-Speed 3067.90 samples/sec   Loss 8.8465   LearningRate 0.0367   Epoch: 7   Global Step: 97890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:58:24,283-Speed 3019.90 samples/sec   Loss 8.7581   LearningRate 0.0367   Epoch: 7   Global Step: 97900   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:58:27,721-Speed 2979.53 samples/sec   Loss 8.9542   LearningRate 0.0367   Epoch: 7   Global Step: 97910   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:58:31,133-Speed 3002.06 samples/sec   Loss 8.6208   LearningRate 0.0367   Epoch: 7   Global Step: 97920   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:58:34,523-Speed 3021.15 samples/sec   Loss 8.8773   LearningRate 0.0367   Epoch: 7   Global Step: 97930   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:58:37,902-Speed 3030.89 samples/sec   Loss 8.9237   LearningRate 0.0367   Epoch: 7   Global Step: 97940   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:58:41,402-Speed 2927.03 samples/sec   Loss 8.7250   LearningRate 0.0367   Epoch: 7   Global Step: 97950   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:58:44,738-Speed 3070.58 samples/sec   Loss 8.8156   LearningRate 0.0367   Epoch: 7   Global Step: 97960   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:58:48,131-Speed 3018.41 samples/sec   Loss 8.7263   LearningRate 0.0367   Epoch: 7   Global Step: 97970   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:58:51,452-Speed 3084.48 samples/sec   Loss 8.8758   LearningRate 0.0367   Epoch: 7   Global Step: 97980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:58:54,890-Speed 2978.68 samples/sec   Loss 8.9573   LearningRate 0.0367   Epoch: 7   Global Step: 97990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:58:58,286-Speed 3016.24 samples/sec   Loss 8.8942   LearningRate 0.0367   Epoch: 7   Global Step: 98000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:59:01,656-Speed 3039.99 samples/sec   Loss 8.9468   LearningRate 0.0367   Epoch: 7   Global Step: 98010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:59:05,011-Speed 3052.47 samples/sec   Loss 8.8953   LearningRate 0.0367   Epoch: 7   Global Step: 98020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:59:08,489-Speed 2945.66 samples/sec   Loss 8.8656   LearningRate 0.0366   Epoch: 7   Global Step: 98030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:59:11,933-Speed 2973.91 samples/sec   Loss 8.9119   LearningRate 0.0366   Epoch: 7   Global Step: 98040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:59:15,325-Speed 3022.95 samples/sec   Loss 8.7959   LearningRate 0.0366   Epoch: 7   Global Step: 98050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:59:18,745-Speed 2994.31 samples/sec   Loss 8.8309   LearningRate 0.0366   Epoch: 7   Global Step: 98060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:59:22,190-Speed 2974.07 samples/sec   Loss 8.7300   LearningRate 0.0366   Epoch: 7   Global Step: 98070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:59:25,615-Speed 2989.76 samples/sec   Loss 8.8800   LearningRate 0.0366   Epoch: 7   Global Step: 98080   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:59:29,111-Speed 2930.39 samples/sec   Loss 8.8846   LearningRate 0.0366   Epoch: 7   Global Step: 98090   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:59:32,522-Speed 3002.89 samples/sec   Loss 8.9373   LearningRate 0.0366   Epoch: 7   Global Step: 98100   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:59:35,865-Speed 3063.93 samples/sec   Loss 8.9864   LearningRate 0.0366   Epoch: 7   Global Step: 98110   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:59:39,209-Speed 3063.73 samples/sec   Loss 8.8678   LearningRate 0.0366   Epoch: 7   Global Step: 98120   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:59:42,565-Speed 3052.10 samples/sec   Loss 8.8094   LearningRate 0.0366   Epoch: 7   Global Step: 98130   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 10:59:45,976-Speed 3002.16 samples/sec   Loss 8.7852   LearningRate 0.0366   Epoch: 7   Global Step: 98140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:59:49,379-Speed 3010.42 samples/sec   Loss 8.7659   LearningRate 0.0366   Epoch: 7   Global Step: 98150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:59:52,791-Speed 3001.99 samples/sec   Loss 8.8008   LearningRate 0.0366   Epoch: 7   Global Step: 98160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:59:56,235-Speed 2973.94 samples/sec   Loss 8.7948   LearningRate 0.0366   Epoch: 7   Global Step: 98170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 10:59:59,713-Speed 2944.77 samples/sec   Loss 8.8538   LearningRate 0.0366   Epoch: 7   Global Step: 98180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:03,157-Speed 2974.31 samples/sec   Loss 8.8353   LearningRate 0.0366   Epoch: 7   Global Step: 98190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:06,551-Speed 3017.68 samples/sec   Loss 8.8099   LearningRate 0.0366   Epoch: 7   Global Step: 98200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:09,901-Speed 3058.29 samples/sec   Loss 8.7490   LearningRate 0.0366   Epoch: 7   Global Step: 98210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:13,373-Speed 2949.86 samples/sec   Loss 8.8203   LearningRate 0.0366   Epoch: 7   Global Step: 98220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:16,765-Speed 3023.56 samples/sec   Loss 8.7997   LearningRate 0.0366   Epoch: 7   Global Step: 98230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:20,121-Speed 3052.28 samples/sec   Loss 8.9117   LearningRate 0.0365   Epoch: 7   Global Step: 98240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:23,545-Speed 2991.56 samples/sec   Loss 8.8297   LearningRate 0.0365   Epoch: 7   Global Step: 98250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:26,995-Speed 2968.30 samples/sec   Loss 8.7606   LearningRate 0.0365   Epoch: 7   Global Step: 98260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:30,397-Speed 3011.56 samples/sec   Loss 8.7752   LearningRate 0.0365   Epoch: 7   Global Step: 98270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:33,782-Speed 3026.30 samples/sec   Loss 8.8297   LearningRate 0.0365   Epoch: 7   Global Step: 98280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:37,138-Speed 3051.31 samples/sec   Loss 8.7576   LearningRate 0.0365   Epoch: 7   Global Step: 98290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:40,449-Speed 3094.23 samples/sec   Loss 8.7964   LearningRate 0.0365   Epoch: 7   Global Step: 98300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:43,863-Speed 2999.94 samples/sec   Loss 8.8885   LearningRate 0.0365   Epoch: 7   Global Step: 98310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:47,311-Speed 2970.79 samples/sec   Loss 8.8308   LearningRate 0.0365   Epoch: 7   Global Step: 98320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:50,743-Speed 2985.08 samples/sec   Loss 8.8482   LearningRate 0.0365   Epoch: 7   Global Step: 98330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:00:54,141-Speed 3013.89 samples/sec   Loss 8.7998   LearningRate 0.0365   Epoch: 7   Global Step: 98340   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:00:57,580-Speed 2978.54 samples/sec   Loss 8.7347   LearningRate 0.0365   Epoch: 7   Global Step: 98350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:01:00,897-Speed 3088.55 samples/sec   Loss 8.8931   LearningRate 0.0365   Epoch: 7   Global Step: 98360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:01:04,250-Speed 3054.85 samples/sec   Loss 8.6643   LearningRate 0.0365   Epoch: 7   Global Step: 98370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:01:07,621-Speed 3038.77 samples/sec   Loss 8.8635   LearningRate 0.0365   Epoch: 7   Global Step: 98380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:01:10,973-Speed 3055.87 samples/sec   Loss 8.8141   LearningRate 0.0365   Epoch: 7   Global Step: 98390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:01:14,293-Speed 3085.11 samples/sec   Loss 8.6422   LearningRate 0.0365   Epoch: 7   Global Step: 98400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:01:17,748-Speed 2964.52 samples/sec   Loss 8.7483   LearningRate 0.0365   Epoch: 7   Global Step: 98410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:01:21,198-Speed 2968.62 samples/sec   Loss 8.8496   LearningRate 0.0365   Epoch: 7   Global Step: 98420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:01:24,527-Speed 3076.98 samples/sec   Loss 8.7667   LearningRate 0.0365   Epoch: 7   Global Step: 98430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:01:27,851-Speed 3081.58 samples/sec   Loss 8.5817   LearningRate 0.0364   Epoch: 7   Global Step: 98440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:01:31,234-Speed 3027.85 samples/sec   Loss 8.8284   LearningRate 0.0364   Epoch: 7   Global Step: 98450   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:01:34,611-Speed 3032.94 samples/sec   Loss 8.8021   LearningRate 0.0364   Epoch: 7   Global Step: 98460   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:01:38,012-Speed 3012.05 samples/sec   Loss 8.6987   LearningRate 0.0364   Epoch: 7   Global Step: 98470   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:01:41,404-Speed 3019.70 samples/sec   Loss 8.7935   LearningRate 0.0364   Epoch: 7   Global Step: 98480   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:01:44,803-Speed 3013.40 samples/sec   Loss 8.7207   LearningRate 0.0364   Epoch: 7   Global Step: 98490   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:01:48,137-Speed 3072.13 samples/sec   Loss 8.7121   LearningRate 0.0364   Epoch: 7   Global Step: 98500   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:01:51,521-Speed 3026.88 samples/sec   Loss 8.6753   LearningRate 0.0364   Epoch: 7   Global Step: 98510   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:01:54,907-Speed 3025.37 samples/sec   Loss 8.7374   LearningRate 0.0364   Epoch: 7   Global Step: 98520   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:01:58,305-Speed 3014.56 samples/sec   Loss 8.7676   LearningRate 0.0364   Epoch: 7   Global Step: 98530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:02:01,684-Speed 3031.18 samples/sec   Loss 8.7443   LearningRate 0.0364   Epoch: 7   Global Step: 98540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:02:05,103-Speed 2995.37 samples/sec   Loss 8.8289   LearningRate 0.0364   Epoch: 7   Global Step: 98550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:02:08,535-Speed 2984.79 samples/sec   Loss 8.6730   LearningRate 0.0364   Epoch: 7   Global Step: 98560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:02:11,893-Speed 3050.33 samples/sec   Loss 8.8245   LearningRate 0.0364   Epoch: 7   Global Step: 98570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:02:15,237-Speed 3062.57 samples/sec   Loss 8.6594   LearningRate 0.0364   Epoch: 7   Global Step: 98580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:02:18,662-Speed 2990.85 samples/sec   Loss 8.8145   LearningRate 0.0364   Epoch: 7   Global Step: 98590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:02:22,142-Speed 2943.59 samples/sec   Loss 8.6629   LearningRate 0.0364   Epoch: 7   Global Step: 98600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:02:25,577-Speed 2981.56 samples/sec   Loss 8.7493   LearningRate 0.0364   Epoch: 7   Global Step: 98610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:02:28,961-Speed 3026.44 samples/sec   Loss 8.8718   LearningRate 0.0364   Epoch: 7   Global Step: 98620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:02:32,363-Speed 3010.82 samples/sec   Loss 8.7751   LearningRate 0.0364   Epoch: 7   Global Step: 98630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:02:35,805-Speed 2975.84 samples/sec   Loss 8.9291   LearningRate 0.0364   Epoch: 7   Global Step: 98640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:02:39,194-Speed 3022.86 samples/sec   Loss 8.8631   LearningRate 0.0363   Epoch: 7   Global Step: 98650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:02:42,623-Speed 2987.05 samples/sec   Loss 8.8539   LearningRate 0.0363   Epoch: 7   Global Step: 98660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:02:46,002-Speed 3031.72 samples/sec   Loss 8.8526   LearningRate 0.0363   Epoch: 7   Global Step: 98670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:02:49,451-Speed 2969.13 samples/sec   Loss 8.7111   LearningRate 0.0363   Epoch: 7   Global Step: 98680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:02:52,843-Speed 3020.08 samples/sec   Loss 8.7559   LearningRate 0.0363   Epoch: 7   Global Step: 98690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:02:56,243-Speed 3012.10 samples/sec   Loss 8.8666   LearningRate 0.0363   Epoch: 7   Global Step: 98700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:02:59,673-Speed 2986.71 samples/sec   Loss 8.8236   LearningRate 0.0363   Epoch: 7   Global Step: 98710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:03:03,061-Speed 3023.33 samples/sec   Loss 8.8859   LearningRate 0.0363   Epoch: 7   Global Step: 98720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:03:06,463-Speed 3011.36 samples/sec   Loss 8.7962   LearningRate 0.0363   Epoch: 7   Global Step: 98730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:03:09,770-Speed 3096.92 samples/sec   Loss 8.7788   LearningRate 0.0363   Epoch: 7   Global Step: 98740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:03:13,103-Speed 3073.44 samples/sec   Loss 8.7348   LearningRate 0.0363   Epoch: 7   Global Step: 98750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:03:16,462-Speed 3048.61 samples/sec   Loss 8.9729   LearningRate 0.0363   Epoch: 7   Global Step: 98760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:03:19,822-Speed 3048.66 samples/sec   Loss 8.8586   LearningRate 0.0363   Epoch: 7   Global Step: 98770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:03:23,176-Speed 3054.40 samples/sec   Loss 8.6978   LearningRate 0.0363   Epoch: 7   Global Step: 98780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:03:26,561-Speed 3025.12 samples/sec   Loss 8.6570   LearningRate 0.0363   Epoch: 7   Global Step: 98790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:03:29,953-Speed 3020.43 samples/sec   Loss 8.7406   LearningRate 0.0363   Epoch: 7   Global Step: 98800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:03:33,337-Speed 3026.36 samples/sec   Loss 8.7189   LearningRate 0.0363   Epoch: 7   Global Step: 98810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:03:36,788-Speed 2968.27 samples/sec   Loss 8.7848   LearningRate 0.0363   Epoch: 7   Global Step: 98820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:03:40,165-Speed 3033.03 samples/sec   Loss 8.7555   LearningRate 0.0363   Epoch: 7   Global Step: 98830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:03:43,616-Speed 2968.13 samples/sec   Loss 8.7859   LearningRate 0.0363   Epoch: 7   Global Step: 98840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:03:47,011-Speed 3016.65 samples/sec   Loss 8.8081   LearningRate 0.0363   Epoch: 7   Global Step: 98850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:03:50,399-Speed 3023.37 samples/sec   Loss 8.9326   LearningRate 0.0362   Epoch: 7   Global Step: 98860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:03:53,729-Speed 3076.01 samples/sec   Loss 8.6891   LearningRate 0.0362   Epoch: 7   Global Step: 98870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:03:57,092-Speed 3045.67 samples/sec   Loss 8.6904   LearningRate 0.0362   Epoch: 7   Global Step: 98880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:00,514-Speed 2993.35 samples/sec   Loss 8.7428   LearningRate 0.0362   Epoch: 7   Global Step: 98890   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:04:03,911-Speed 3014.88 samples/sec   Loss 8.7562   LearningRate 0.0362   Epoch: 7   Global Step: 98900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:07,345-Speed 2983.12 samples/sec   Loss 8.8476   LearningRate 0.0362   Epoch: 7   Global Step: 98910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:10,724-Speed 3031.71 samples/sec   Loss 8.8251   LearningRate 0.0362   Epoch: 7   Global Step: 98920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:14,089-Speed 3043.58 samples/sec   Loss 8.8441   LearningRate 0.0362   Epoch: 7   Global Step: 98930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:17,433-Speed 3063.38 samples/sec   Loss 8.8471   LearningRate 0.0362   Epoch: 7   Global Step: 98940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:20,828-Speed 3016.98 samples/sec   Loss 8.7348   LearningRate 0.0362   Epoch: 7   Global Step: 98950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:24,200-Speed 3037.47 samples/sec   Loss 8.8005   LearningRate 0.0362   Epoch: 7   Global Step: 98960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:27,659-Speed 2960.76 samples/sec   Loss 8.6458   LearningRate 0.0362   Epoch: 7   Global Step: 98970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:31,052-Speed 3019.14 samples/sec   Loss 8.7909   LearningRate 0.0362   Epoch: 7   Global Step: 98980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:34,456-Speed 3008.95 samples/sec   Loss 8.7020   LearningRate 0.0362   Epoch: 7   Global Step: 98990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:37,827-Speed 3038.79 samples/sec   Loss 8.7750   LearningRate 0.0362   Epoch: 7   Global Step: 99000   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:04:41,198-Speed 3038.34 samples/sec   Loss 8.7691   LearningRate 0.0362   Epoch: 7   Global Step: 99010   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:04:44,521-Speed 3083.19 samples/sec   Loss 8.6457   LearningRate 0.0362   Epoch: 7   Global Step: 99020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:47,889-Speed 3041.27 samples/sec   Loss 8.7752   LearningRate 0.0362   Epoch: 7   Global Step: 99030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:51,283-Speed 3017.60 samples/sec   Loss 8.7709   LearningRate 0.0362   Epoch: 7   Global Step: 99040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:54,675-Speed 3019.59 samples/sec   Loss 8.7951   LearningRate 0.0362   Epoch: 7   Global Step: 99050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:04:58,038-Speed 3046.71 samples/sec   Loss 8.6726   LearningRate 0.0361   Epoch: 7   Global Step: 99060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:01,381-Speed 3062.97 samples/sec   Loss 8.7520   LearningRate 0.0361   Epoch: 7   Global Step: 99070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:04,764-Speed 3028.06 samples/sec   Loss 8.6848   LearningRate 0.0361   Epoch: 7   Global Step: 99080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:08,153-Speed 3022.69 samples/sec   Loss 8.8206   LearningRate 0.0361   Epoch: 7   Global Step: 99090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:11,456-Speed 3101.22 samples/sec   Loss 8.7614   LearningRate 0.0361   Epoch: 7   Global Step: 99100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:14,770-Speed 3090.47 samples/sec   Loss 8.8022   LearningRate 0.0361   Epoch: 7   Global Step: 99110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:18,182-Speed 3001.97 samples/sec   Loss 8.8407   LearningRate 0.0361   Epoch: 7   Global Step: 99120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:21,526-Speed 3063.66 samples/sec   Loss 8.8280   LearningRate 0.0361   Epoch: 7   Global Step: 99130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:24,945-Speed 2995.75 samples/sec   Loss 8.6800   LearningRate 0.0361   Epoch: 7   Global Step: 99140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:28,259-Speed 3090.34 samples/sec   Loss 8.8605   LearningRate 0.0361   Epoch: 7   Global Step: 99150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:31,633-Speed 3036.10 samples/sec   Loss 8.7610   LearningRate 0.0361   Epoch: 7   Global Step: 99160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:35,016-Speed 3027.80 samples/sec   Loss 8.7074   LearningRate 0.0361   Epoch: 7   Global Step: 99170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:38,474-Speed 2961.81 samples/sec   Loss 8.8910   LearningRate 0.0361   Epoch: 7   Global Step: 99180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:41,911-Speed 2980.06 samples/sec   Loss 8.7482   LearningRate 0.0361   Epoch: 7   Global Step: 99190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:45,269-Speed 3051.00 samples/sec   Loss 8.5725   LearningRate 0.0361   Epoch: 7   Global Step: 99200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:48,705-Speed 2981.01 samples/sec   Loss 8.7157   LearningRate 0.0361   Epoch: 7   Global Step: 99210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:52,045-Speed 3067.01 samples/sec   Loss 8.7156   LearningRate 0.0361   Epoch: 7   Global Step: 99220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:55,417-Speed 3037.30 samples/sec   Loss 8.7131   LearningRate 0.0361   Epoch: 7   Global Step: 99230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:05:58,834-Speed 2997.81 samples/sec   Loss 8.6120   LearningRate 0.0361   Epoch: 7   Global Step: 99240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:06:02,224-Speed 3021.51 samples/sec   Loss 8.8910   LearningRate 0.0361   Epoch: 7   Global Step: 99250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:06:05,588-Speed 3044.74 samples/sec   Loss 8.6300   LearningRate 0.0361   Epoch: 7   Global Step: 99260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:06:08,920-Speed 3074.47 samples/sec   Loss 8.7194   LearningRate 0.0360   Epoch: 7   Global Step: 99270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:06:12,307-Speed 3023.78 samples/sec   Loss 8.7861   LearningRate 0.0360   Epoch: 7   Global Step: 99280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:06:15,666-Speed 3049.49 samples/sec   Loss 8.6518   LearningRate 0.0360   Epoch: 7   Global Step: 99290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:06:19,131-Speed 2956.31 samples/sec   Loss 8.5905   LearningRate 0.0360   Epoch: 7   Global Step: 99300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:06:22,605-Speed 2948.21 samples/sec   Loss 8.7057   LearningRate 0.0360   Epoch: 7   Global Step: 99310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:06:25,931-Speed 3079.82 samples/sec   Loss 8.7123   LearningRate 0.0360   Epoch: 7   Global Step: 99320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:06:29,296-Speed 3043.75 samples/sec   Loss 8.8262   LearningRate 0.0360   Epoch: 7   Global Step: 99330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:06:32,653-Speed 3050.88 samples/sec   Loss 8.8175   LearningRate 0.0360   Epoch: 7   Global Step: 99340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:06:36,001-Speed 3060.04 samples/sec   Loss 8.6552   LearningRate 0.0360   Epoch: 7   Global Step: 99350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:06:39,631-Speed 2821.74 samples/sec   Loss 8.6464   LearningRate 0.0360   Epoch: 7   Global Step: 99360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:07:11,891-Speed 317.44 samples/sec   Loss 8.3451   LearningRate 0.0360   Epoch: 8   Global Step: 99370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:07:15,531-Speed 2814.11 samples/sec   Loss 7.1314   LearningRate 0.0360   Epoch: 8   Global Step: 99380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:07:19,174-Speed 2812.01 samples/sec   Loss 7.3146   LearningRate 0.0360   Epoch: 8   Global Step: 99390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:07:22,530-Speed 3052.14 samples/sec   Loss 7.1750   LearningRate 0.0360   Epoch: 8   Global Step: 99400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:07:25,929-Speed 3013.39 samples/sec   Loss 7.1601   LearningRate 0.0360   Epoch: 8   Global Step: 99410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:07:29,287-Speed 3050.15 samples/sec   Loss 7.2595   LearningRate 0.0360   Epoch: 8   Global Step: 99420   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:07:32,630-Speed 3065.11 samples/sec   Loss 7.1778   LearningRate 0.0360   Epoch: 8   Global Step: 99430   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:07:36,031-Speed 3011.82 samples/sec   Loss 7.1284   LearningRate 0.0360   Epoch: 8   Global Step: 99440   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:07:39,590-Speed 2877.40 samples/sec   Loss 7.2354   LearningRate 0.0360   Epoch: 8   Global Step: 99450   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:07:42,989-Speed 3014.10 samples/sec   Loss 7.3683   LearningRate 0.0360   Epoch: 8   Global Step: 99460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:07:46,330-Speed 3065.42 samples/sec   Loss 7.2832   LearningRate 0.0360   Epoch: 8   Global Step: 99470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:07:49,744-Speed 3000.85 samples/sec   Loss 7.3471   LearningRate 0.0359   Epoch: 8   Global Step: 99480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:07:53,128-Speed 3026.66 samples/sec   Loss 7.2503   LearningRate 0.0359   Epoch: 8   Global Step: 99490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:07:56,463-Speed 3070.87 samples/sec   Loss 7.3702   LearningRate 0.0359   Epoch: 8   Global Step: 99500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:07:59,779-Speed 3088.90 samples/sec   Loss 7.2685   LearningRate 0.0359   Epoch: 8   Global Step: 99510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:08:03,146-Speed 3042.39 samples/sec   Loss 7.3333   LearningRate 0.0359   Epoch: 8   Global Step: 99520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:08:06,468-Speed 3083.11 samples/sec   Loss 7.1936   LearningRate 0.0359   Epoch: 8   Global Step: 99530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:08:09,853-Speed 3026.87 samples/sec   Loss 7.2725   LearningRate 0.0359   Epoch: 8   Global Step: 99540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:08:13,201-Speed 3059.26 samples/sec   Loss 7.2587   LearningRate 0.0359   Epoch: 8   Global Step: 99550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:08:16,509-Speed 3097.09 samples/sec   Loss 7.3167   LearningRate 0.0359   Epoch: 8   Global Step: 99560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:08:19,806-Speed 3106.85 samples/sec   Loss 7.4791   LearningRate 0.0359   Epoch: 8   Global Step: 99570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:08:23,128-Speed 3083.01 samples/sec   Loss 7.3314   LearningRate 0.0359   Epoch: 8   Global Step: 99580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:08:26,447-Speed 3085.64 samples/sec   Loss 7.3774   LearningRate 0.0359   Epoch: 8   Global Step: 99590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:08:29,812-Speed 3044.68 samples/sec   Loss 7.4138   LearningRate 0.0359   Epoch: 8   Global Step: 99600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:08:33,206-Speed 3017.94 samples/sec   Loss 7.3264   LearningRate 0.0359   Epoch: 8   Global Step: 99610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:08:36,579-Speed 3036.46 samples/sec   Loss 7.3207   LearningRate 0.0359   Epoch: 8   Global Step: 99620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:08:40,033-Speed 2965.78 samples/sec   Loss 7.3822   LearningRate 0.0359   Epoch: 8   Global Step: 99630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:08:43,414-Speed 3030.13 samples/sec   Loss 7.4612   LearningRate 0.0359   Epoch: 8   Global Step: 99640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:08:46,815-Speed 3011.88 samples/sec   Loss 7.4116   LearningRate 0.0359   Epoch: 8   Global Step: 99650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:08:50,357-Speed 2891.82 samples/sec   Loss 7.2968   LearningRate 0.0359   Epoch: 8   Global Step: 99660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:08:53,984-Speed 2824.60 samples/sec   Loss 7.4792   LearningRate 0.0359   Epoch: 8   Global Step: 99670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:08:57,449-Speed 2955.98 samples/sec   Loss 7.3006   LearningRate 0.0358   Epoch: 8   Global Step: 99680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:00,824-Speed 3035.12 samples/sec   Loss 7.3465   LearningRate 0.0358   Epoch: 8   Global Step: 99690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:04,208-Speed 3026.74 samples/sec   Loss 7.3286   LearningRate 0.0358   Epoch: 8   Global Step: 99700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:07,606-Speed 3014.46 samples/sec   Loss 7.3858   LearningRate 0.0358   Epoch: 8   Global Step: 99710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:11,027-Speed 2993.58 samples/sec   Loss 7.3979   LearningRate 0.0358   Epoch: 8   Global Step: 99720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:14,446-Speed 2996.31 samples/sec   Loss 7.3965   LearningRate 0.0358   Epoch: 8   Global Step: 99730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:17,821-Speed 3035.18 samples/sec   Loss 7.3293   LearningRate 0.0358   Epoch: 8   Global Step: 99740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:21,264-Speed 2975.05 samples/sec   Loss 7.3937   LearningRate 0.0358   Epoch: 8   Global Step: 99750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:24,631-Speed 3042.05 samples/sec   Loss 7.3941   LearningRate 0.0358   Epoch: 8   Global Step: 99760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:28,024-Speed 3019.13 samples/sec   Loss 7.4289   LearningRate 0.0358   Epoch: 8   Global Step: 99770   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:09:31,401-Speed 3033.80 samples/sec   Loss 7.4566   LearningRate 0.0358   Epoch: 8   Global Step: 99780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:34,747-Speed 3061.11 samples/sec   Loss 7.4410   LearningRate 0.0358   Epoch: 8   Global Step: 99790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:38,101-Speed 3053.62 samples/sec   Loss 7.4193   LearningRate 0.0358   Epoch: 8   Global Step: 99800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:41,409-Speed 3096.88 samples/sec   Loss 7.5260   LearningRate 0.0358   Epoch: 8   Global Step: 99810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:44,729-Speed 3085.57 samples/sec   Loss 7.4995   LearningRate 0.0358   Epoch: 8   Global Step: 99820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:48,848-Speed 2486.38 samples/sec   Loss 7.4539   LearningRate 0.0358   Epoch: 8   Global Step: 99830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:52,184-Speed 3071.20 samples/sec   Loss 7.4625   LearningRate 0.0358   Epoch: 8   Global Step: 99840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:55,542-Speed 3049.51 samples/sec   Loss 7.4311   LearningRate 0.0358   Epoch: 8   Global Step: 99850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:09:58,891-Speed 3058.74 samples/sec   Loss 7.4624   LearningRate 0.0358   Epoch: 8   Global Step: 99860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:10:02,214-Speed 3082.20 samples/sec   Loss 7.5377   LearningRate 0.0358   Epoch: 8   Global Step: 99870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:10:05,591-Speed 3033.15 samples/sec   Loss 7.3936   LearningRate 0.0358   Epoch: 8   Global Step: 99880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:10:08,968-Speed 3033.26 samples/sec   Loss 7.4006   LearningRate 0.0357   Epoch: 8   Global Step: 99890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:10:12,352-Speed 3027.08 samples/sec   Loss 7.5452   LearningRate 0.0357   Epoch: 8   Global Step: 99900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:10:15,716-Speed 3045.28 samples/sec   Loss 7.5889   LearningRate 0.0357   Epoch: 8   Global Step: 99910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:10:19,058-Speed 3064.93 samples/sec   Loss 7.4154   LearningRate 0.0357   Epoch: 8   Global Step: 99920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:10:22,434-Speed 3034.05 samples/sec   Loss 7.5875   LearningRate 0.0357   Epoch: 8   Global Step: 99930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:10:25,815-Speed 3029.37 samples/sec   Loss 7.4723   LearningRate 0.0357   Epoch: 8   Global Step: 99940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:10:29,183-Speed 3041.17 samples/sec   Loss 7.3787   LearningRate 0.0357   Epoch: 8   Global Step: 99950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:10:32,609-Speed 2990.40 samples/sec   Loss 7.4771   LearningRate 0.0357   Epoch: 8   Global Step: 99960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:10:35,929-Speed 3084.98 samples/sec   Loss 7.6808   LearningRate 0.0357   Epoch: 8   Global Step: 99970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:10:39,276-Speed 3060.15 samples/sec   Loss 7.6132   LearningRate 0.0357   Epoch: 8   Global Step: 99980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:10:42,705-Speed 2987.24 samples/sec   Loss 7.6285   LearningRate 0.0357   Epoch: 8   Global Step: 99990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:10:46,050-Speed 3062.24 samples/sec   Loss 7.5130   LearningRate 0.0357   Epoch: 8   Global Step: 100000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:10:49,411-Speed 3048.08 samples/sec   Loss 7.6380   LearningRate 0.0357   Epoch: 8   Global Step: 100010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:10:52,755-Speed 3062.45 samples/sec   Loss 7.4507   LearningRate 0.0357   Epoch: 8   Global Step: 100020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:10:56,117-Speed 3047.07 samples/sec   Loss 7.6197   LearningRate 0.0357   Epoch: 8   Global Step: 100030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:10:59,540-Speed 2992.72 samples/sec   Loss 7.5550   LearningRate 0.0357   Epoch: 8   Global Step: 100040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:02,866-Speed 3079.67 samples/sec   Loss 7.5072   LearningRate 0.0357   Epoch: 8   Global Step: 100050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:06,311-Speed 2973.49 samples/sec   Loss 7.5440   LearningRate 0.0357   Epoch: 8   Global Step: 100060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:09,699-Speed 3022.81 samples/sec   Loss 7.5776   LearningRate 0.0357   Epoch: 8   Global Step: 100070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:13,101-Speed 3011.41 samples/sec   Loss 7.7882   LearningRate 0.0357   Epoch: 8   Global Step: 100080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:16,457-Speed 3052.71 samples/sec   Loss 7.6378   LearningRate 0.0357   Epoch: 8   Global Step: 100090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:19,816-Speed 3048.55 samples/sec   Loss 7.6560   LearningRate 0.0356   Epoch: 8   Global Step: 100100   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:11:23,149-Speed 3073.90 samples/sec   Loss 7.5470   LearningRate 0.0356   Epoch: 8   Global Step: 100110   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:11:26,503-Speed 3053.98 samples/sec   Loss 7.5587   LearningRate 0.0356   Epoch: 8   Global Step: 100120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:29,854-Speed 3056.41 samples/sec   Loss 7.6051   LearningRate 0.0356   Epoch: 8   Global Step: 100130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:33,233-Speed 3031.47 samples/sec   Loss 7.4905   LearningRate 0.0356   Epoch: 8   Global Step: 100140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:36,612-Speed 3032.19 samples/sec   Loss 7.6335   LearningRate 0.0356   Epoch: 8   Global Step: 100150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:39,971-Speed 3049.72 samples/sec   Loss 7.6822   LearningRate 0.0356   Epoch: 8   Global Step: 100160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:43,375-Speed 3009.35 samples/sec   Loss 7.6212   LearningRate 0.0356   Epoch: 8   Global Step: 100170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:46,766-Speed 3020.52 samples/sec   Loss 7.6544   LearningRate 0.0356   Epoch: 8   Global Step: 100180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:50,139-Speed 3036.64 samples/sec   Loss 7.6865   LearningRate 0.0356   Epoch: 8   Global Step: 100190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:53,548-Speed 3004.54 samples/sec   Loss 7.5816   LearningRate 0.0356   Epoch: 8   Global Step: 100200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:11:56,949-Speed 3012.03 samples/sec   Loss 7.7824   LearningRate 0.0356   Epoch: 8   Global Step: 100210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:12:00,326-Speed 3032.50 samples/sec   Loss 7.7136   LearningRate 0.0356   Epoch: 8   Global Step: 100220   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:12:03,687-Speed 3048.89 samples/sec   Loss 7.6310   LearningRate 0.0356   Epoch: 8   Global Step: 100230   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:12:07,079-Speed 3019.51 samples/sec   Loss 7.6648   LearningRate 0.0356   Epoch: 8   Global Step: 100240   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:12:10,513-Speed 2982.95 samples/sec   Loss 7.7585   LearningRate 0.0356   Epoch: 8   Global Step: 100250   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:12:13,842-Speed 3076.68 samples/sec   Loss 7.7105   LearningRate 0.0356   Epoch: 8   Global Step: 100260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:12:17,149-Speed 3097.54 samples/sec   Loss 7.6160   LearningRate 0.0356   Epoch: 8   Global Step: 100270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:12:20,456-Speed 3096.39 samples/sec   Loss 7.6795   LearningRate 0.0356   Epoch: 8   Global Step: 100280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:12:23,870-Speed 3000.61 samples/sec   Loss 7.6807   LearningRate 0.0356   Epoch: 8   Global Step: 100290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:12:27,197-Speed 3078.94 samples/sec   Loss 7.5552   LearningRate 0.0356   Epoch: 8   Global Step: 100300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:12:30,528-Speed 3075.52 samples/sec   Loss 7.6675   LearningRate 0.0355   Epoch: 8   Global Step: 100310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:12:33,949-Speed 2993.81 samples/sec   Loss 7.8786   LearningRate 0.0355   Epoch: 8   Global Step: 100320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:12:37,290-Speed 3065.95 samples/sec   Loss 7.8891   LearningRate 0.0355   Epoch: 8   Global Step: 100330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:12:40,619-Speed 3077.46 samples/sec   Loss 7.7168   LearningRate 0.0355   Epoch: 8   Global Step: 100340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:12:43,977-Speed 3050.87 samples/sec   Loss 7.7196   LearningRate 0.0355   Epoch: 8   Global Step: 100350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:12:47,326-Speed 3057.89 samples/sec   Loss 7.7625   LearningRate 0.0355   Epoch: 8   Global Step: 100360   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:12:50,708-Speed 3028.56 samples/sec   Loss 7.7323   LearningRate 0.0355   Epoch: 8   Global Step: 100370   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:12:54,023-Speed 3090.63 samples/sec   Loss 7.6857   LearningRate 0.0355   Epoch: 8   Global Step: 100380   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:12:57,451-Speed 2986.96 samples/sec   Loss 7.7657   LearningRate 0.0355   Epoch: 8   Global Step: 100390   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:13:00,798-Speed 3060.38 samples/sec   Loss 7.7763   LearningRate 0.0355   Epoch: 8   Global Step: 100400   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:13:04,208-Speed 3005.07 samples/sec   Loss 7.7927   LearningRate 0.0355   Epoch: 8   Global Step: 100410   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:13:07,602-Speed 3017.87 samples/sec   Loss 7.8617   LearningRate 0.0355   Epoch: 8   Global Step: 100420   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:13:11,076-Speed 2948.20 samples/sec   Loss 7.7348   LearningRate 0.0355   Epoch: 8   Global Step: 100430   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:13:14,390-Speed 3091.02 samples/sec   Loss 7.8245   LearningRate 0.0355   Epoch: 8   Global Step: 100440   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:13:17,738-Speed 3059.35 samples/sec   Loss 7.6511   LearningRate 0.0355   Epoch: 8   Global Step: 100450   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:13:21,113-Speed 3035.24 samples/sec   Loss 7.5906   LearningRate 0.0355   Epoch: 8   Global Step: 100460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:13:24,434-Speed 3084.77 samples/sec   Loss 7.7297   LearningRate 0.0355   Epoch: 8   Global Step: 100470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:13:27,761-Speed 3078.41 samples/sec   Loss 7.7054   LearningRate 0.0355   Epoch: 8   Global Step: 100480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:13:31,087-Speed 3079.99 samples/sec   Loss 7.7374   LearningRate 0.0355   Epoch: 8   Global Step: 100490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:13:34,498-Speed 3002.63 samples/sec   Loss 8.0391   LearningRate 0.0355   Epoch: 8   Global Step: 100500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:13:37,979-Speed 2941.77 samples/sec   Loss 7.7674   LearningRate 0.0355   Epoch: 8   Global Step: 100510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:13:41,376-Speed 3016.27 samples/sec   Loss 7.7639   LearningRate 0.0354   Epoch: 8   Global Step: 100520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:13:44,761-Speed 3025.44 samples/sec   Loss 7.8283   LearningRate 0.0354   Epoch: 8   Global Step: 100530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:13:48,155-Speed 3017.68 samples/sec   Loss 7.8281   LearningRate 0.0354   Epoch: 8   Global Step: 100540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:13:51,570-Speed 2999.43 samples/sec   Loss 7.7697   LearningRate 0.0354   Epoch: 8   Global Step: 100550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:13:54,951-Speed 3030.00 samples/sec   Loss 7.9313   LearningRate 0.0354   Epoch: 8   Global Step: 100560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:13:58,280-Speed 3075.84 samples/sec   Loss 7.8098   LearningRate 0.0354   Epoch: 8   Global Step: 100570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:14:01,674-Speed 3018.64 samples/sec   Loss 7.7978   LearningRate 0.0354   Epoch: 8   Global Step: 100580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:14:05,120-Speed 2972.52 samples/sec   Loss 7.8973   LearningRate 0.0354   Epoch: 8   Global Step: 100590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:14:08,490-Speed 3038.70 samples/sec   Loss 7.7771   LearningRate 0.0354   Epoch: 8   Global Step: 100600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:14:11,823-Speed 3074.41 samples/sec   Loss 7.8065   LearningRate 0.0354   Epoch: 8   Global Step: 100610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:14:15,139-Speed 3088.46 samples/sec   Loss 7.9188   LearningRate 0.0354   Epoch: 8   Global Step: 100620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:14:18,468-Speed 3077.65 samples/sec   Loss 7.8083   LearningRate 0.0354   Epoch: 8   Global Step: 100630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:14:21,821-Speed 3054.44 samples/sec   Loss 7.6664   LearningRate 0.0354   Epoch: 8   Global Step: 100640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:14:25,123-Speed 3102.60 samples/sec   Loss 7.7320   LearningRate 0.0354   Epoch: 8   Global Step: 100650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:14:28,578-Speed 2965.15 samples/sec   Loss 7.8602   LearningRate 0.0354   Epoch: 8   Global Step: 100660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:14:31,988-Speed 3003.35 samples/sec   Loss 7.8082   LearningRate 0.0354   Epoch: 8   Global Step: 100670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:14:35,369-Speed 3029.99 samples/sec   Loss 7.9484   LearningRate 0.0354   Epoch: 8   Global Step: 100680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:14:38,704-Speed 3071.58 samples/sec   Loss 7.8321   LearningRate 0.0354   Epoch: 8   Global Step: 100690   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:14:42,153-Speed 2969.52 samples/sec   Loss 7.8563   LearningRate 0.0354   Epoch: 8   Global Step: 100700   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:14:45,460-Speed 3097.80 samples/sec   Loss 7.7318   LearningRate 0.0354   Epoch: 8   Global Step: 100710   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:14:48,773-Speed 3091.49 samples/sec   Loss 8.0411   LearningRate 0.0353   Epoch: 8   Global Step: 100720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:14:52,113-Speed 3066.33 samples/sec   Loss 7.8553   LearningRate 0.0353   Epoch: 8   Global Step: 100730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:14:55,502-Speed 3022.42 samples/sec   Loss 7.8334   LearningRate 0.0353   Epoch: 8   Global Step: 100740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:14:58,959-Speed 2962.80 samples/sec   Loss 7.8685   LearningRate 0.0353   Epoch: 8   Global Step: 100750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:02,350-Speed 3020.31 samples/sec   Loss 7.9087   LearningRate 0.0353   Epoch: 8   Global Step: 100760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:05,676-Speed 3080.17 samples/sec   Loss 7.8606   LearningRate 0.0353   Epoch: 8   Global Step: 100770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:09,063-Speed 3024.57 samples/sec   Loss 7.8402   LearningRate 0.0353   Epoch: 8   Global Step: 100780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:12,529-Speed 2955.11 samples/sec   Loss 7.9247   LearningRate 0.0353   Epoch: 8   Global Step: 100790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:15,909-Speed 3030.76 samples/sec   Loss 7.8795   LearningRate 0.0353   Epoch: 8   Global Step: 100800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:19,287-Speed 3032.38 samples/sec   Loss 8.0003   LearningRate 0.0353   Epoch: 8   Global Step: 100810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:22,661-Speed 3035.83 samples/sec   Loss 7.8783   LearningRate 0.0353   Epoch: 8   Global Step: 100820   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:15:26,035-Speed 3036.02 samples/sec   Loss 7.9106   LearningRate 0.0353   Epoch: 8   Global Step: 100830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:29,373-Speed 3069.03 samples/sec   Loss 7.8657   LearningRate 0.0353   Epoch: 8   Global Step: 100840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:32,725-Speed 3055.60 samples/sec   Loss 7.8959   LearningRate 0.0353   Epoch: 8   Global Step: 100850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:36,050-Speed 3080.35 samples/sec   Loss 7.9777   LearningRate 0.0353   Epoch: 8   Global Step: 100860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:39,447-Speed 3015.94 samples/sec   Loss 7.9972   LearningRate 0.0353   Epoch: 8   Global Step: 100870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:42,796-Speed 3058.64 samples/sec   Loss 7.8998   LearningRate 0.0353   Epoch: 8   Global Step: 100880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:46,234-Speed 2979.55 samples/sec   Loss 7.8992   LearningRate 0.0353   Epoch: 8   Global Step: 100890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:49,652-Speed 2996.65 samples/sec   Loss 7.9697   LearningRate 0.0353   Epoch: 8   Global Step: 100900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:53,068-Speed 2998.36 samples/sec   Loss 7.9922   LearningRate 0.0353   Epoch: 8   Global Step: 100910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:56,418-Speed 3057.21 samples/sec   Loss 8.0528   LearningRate 0.0353   Epoch: 8   Global Step: 100920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:15:59,787-Speed 3040.22 samples/sec   Loss 7.9242   LearningRate 0.0352   Epoch: 8   Global Step: 100930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:16:03,116-Speed 3077.15 samples/sec   Loss 7.9446   LearningRate 0.0352   Epoch: 8   Global Step: 100940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:16:06,495-Speed 3031.72 samples/sec   Loss 8.0086   LearningRate 0.0352   Epoch: 8   Global Step: 100950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:16:09,879-Speed 3026.72 samples/sec   Loss 7.9094   LearningRate 0.0352   Epoch: 8   Global Step: 100960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:16:13,216-Speed 3069.66 samples/sec   Loss 8.0628   LearningRate 0.0352   Epoch: 8   Global Step: 100970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:16:16,682-Speed 2954.95 samples/sec   Loss 7.9397   LearningRate 0.0352   Epoch: 8   Global Step: 100980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:16:20,134-Speed 2967.11 samples/sec   Loss 7.9410   LearningRate 0.0352   Epoch: 8   Global Step: 100990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:16:23,522-Speed 3023.52 samples/sec   Loss 8.0036   LearningRate 0.0352   Epoch: 8   Global Step: 101000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:16:26,879-Speed 3050.59 samples/sec   Loss 8.0455   LearningRate 0.0352   Epoch: 8   Global Step: 101010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:16:30,306-Speed 2989.01 samples/sec   Loss 7.8875   LearningRate 0.0352   Epoch: 8   Global Step: 101020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:16:33,647-Speed 3066.03 samples/sec   Loss 8.0504   LearningRate 0.0352   Epoch: 8   Global Step: 101030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:16:37,083-Speed 2981.18 samples/sec   Loss 7.9217   LearningRate 0.0352   Epoch: 8   Global Step: 101040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:16:40,512-Speed 2987.15 samples/sec   Loss 7.9317   LearningRate 0.0352   Epoch: 8   Global Step: 101050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:16:43,904-Speed 3020.19 samples/sec   Loss 7.8957   LearningRate 0.0352   Epoch: 8   Global Step: 101060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:16:47,246-Speed 3064.02 samples/sec   Loss 7.9089   LearningRate 0.0352   Epoch: 8   Global Step: 101070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:16:50,646-Speed 3013.07 samples/sec   Loss 8.0180   LearningRate 0.0352   Epoch: 8   Global Step: 101080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:16:54,057-Speed 3002.83 samples/sec   Loss 7.9945   LearningRate 0.0352   Epoch: 8   Global Step: 101090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:16:57,459-Speed 3011.12 samples/sec   Loss 7.9189   LearningRate 0.0352   Epoch: 8   Global Step: 101100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:17:00,868-Speed 3004.27 samples/sec   Loss 7.9558   LearningRate 0.0352   Epoch: 8   Global Step: 101110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:17:04,222-Speed 3054.05 samples/sec   Loss 7.9728   LearningRate 0.0352   Epoch: 8   Global Step: 101120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:17:07,711-Speed 2935.82 samples/sec   Loss 7.9117   LearningRate 0.0352   Epoch: 8   Global Step: 101130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:17:11,088-Speed 3033.22 samples/sec   Loss 7.9994   LearningRate 0.0351   Epoch: 8   Global Step: 101140   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:17:14,548-Speed 2960.85 samples/sec   Loss 8.0292   LearningRate 0.0351   Epoch: 8   Global Step: 101150   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:17:17,929-Speed 3029.90 samples/sec   Loss 8.0912   LearningRate 0.0351   Epoch: 8   Global Step: 101160   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:17:21,306-Speed 3032.14 samples/sec   Loss 8.1336   LearningRate 0.0351   Epoch: 8   Global Step: 101170   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:17:24,679-Speed 3037.46 samples/sec   Loss 8.1369   LearningRate 0.0351   Epoch: 8   Global Step: 101180   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:17:28,111-Speed 2984.37 samples/sec   Loss 7.8505   LearningRate 0.0351   Epoch: 8   Global Step: 101190   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:17:31,454-Speed 3064.13 samples/sec   Loss 8.0211   LearningRate 0.0351   Epoch: 8   Global Step: 101200   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:17:34,874-Speed 2995.02 samples/sec   Loss 8.0240   LearningRate 0.0351   Epoch: 8   Global Step: 101210   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:17:38,273-Speed 3014.05 samples/sec   Loss 7.9739   LearningRate 0.0351   Epoch: 8   Global Step: 101220   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:17:41,730-Speed 2962.71 samples/sec   Loss 7.9676   LearningRate 0.0351   Epoch: 8   Global Step: 101230   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:17:45,114-Speed 3026.98 samples/sec   Loss 7.9761   LearningRate 0.0351   Epoch: 8   Global Step: 101240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:17:48,440-Speed 3079.31 samples/sec   Loss 7.8668   LearningRate 0.0351   Epoch: 8   Global Step: 101250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:17:51,793-Speed 3054.86 samples/sec   Loss 8.0201   LearningRate 0.0351   Epoch: 8   Global Step: 101260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:17:55,161-Speed 3041.78 samples/sec   Loss 8.0889   LearningRate 0.0351   Epoch: 8   Global Step: 101270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:17:58,539-Speed 3031.94 samples/sec   Loss 8.0545   LearningRate 0.0351   Epoch: 8   Global Step: 101280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:01,963-Speed 2991.65 samples/sec   Loss 8.0656   LearningRate 0.0351   Epoch: 8   Global Step: 101290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:05,363-Speed 3012.44 samples/sec   Loss 8.1381   LearningRate 0.0351   Epoch: 8   Global Step: 101300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:08,777-Speed 3000.51 samples/sec   Loss 8.0252   LearningRate 0.0351   Epoch: 8   Global Step: 101310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:12,164-Speed 3023.98 samples/sec   Loss 8.0972   LearningRate 0.0351   Epoch: 8   Global Step: 101320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:15,568-Speed 3008.77 samples/sec   Loss 8.0957   LearningRate 0.0351   Epoch: 8   Global Step: 101330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:19,034-Speed 2955.60 samples/sec   Loss 8.1053   LearningRate 0.0351   Epoch: 8   Global Step: 101340   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:18:22,434-Speed 3012.40 samples/sec   Loss 8.0719   LearningRate 0.0350   Epoch: 8   Global Step: 101350   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:18:25,781-Speed 3060.72 samples/sec   Loss 8.2370   LearningRate 0.0350   Epoch: 8   Global Step: 101360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:29,200-Speed 2995.88 samples/sec   Loss 7.9575   LearningRate 0.0350   Epoch: 8   Global Step: 101370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:32,556-Speed 3051.58 samples/sec   Loss 8.1000   LearningRate 0.0350   Epoch: 8   Global Step: 101380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:35,951-Speed 3017.63 samples/sec   Loss 8.0822   LearningRate 0.0350   Epoch: 8   Global Step: 101390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:39,323-Speed 3038.05 samples/sec   Loss 8.2682   LearningRate 0.0350   Epoch: 8   Global Step: 101400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:42,761-Speed 2979.01 samples/sec   Loss 8.0820   LearningRate 0.0350   Epoch: 8   Global Step: 101410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:46,209-Speed 2970.22 samples/sec   Loss 8.0088   LearningRate 0.0350   Epoch: 8   Global Step: 101420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:49,605-Speed 3017.01 samples/sec   Loss 8.2121   LearningRate 0.0350   Epoch: 8   Global Step: 101430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:53,010-Speed 3007.48 samples/sec   Loss 7.9575   LearningRate 0.0350   Epoch: 8   Global Step: 101440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:56,443-Speed 2984.80 samples/sec   Loss 8.0516   LearningRate 0.0350   Epoch: 8   Global Step: 101450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:18:59,891-Speed 2970.66 samples/sec   Loss 8.3590   LearningRate 0.0350   Epoch: 8   Global Step: 101460   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:19:03,244-Speed 3054.39 samples/sec   Loss 8.1479   LearningRate 0.0350   Epoch: 8   Global Step: 101470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:06,636-Speed 3019.72 samples/sec   Loss 7.9825   LearningRate 0.0350   Epoch: 8   Global Step: 101480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:10,033-Speed 3015.75 samples/sec   Loss 8.1047   LearningRate 0.0350   Epoch: 8   Global Step: 101490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:13,418-Speed 3025.87 samples/sec   Loss 8.2024   LearningRate 0.0350   Epoch: 8   Global Step: 101500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:16,831-Speed 3001.28 samples/sec   Loss 8.2209   LearningRate 0.0350   Epoch: 8   Global Step: 101510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:20,180-Speed 3058.28 samples/sec   Loss 8.1142   LearningRate 0.0350   Epoch: 8   Global Step: 101520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:23,501-Speed 3084.65 samples/sec   Loss 8.0681   LearningRate 0.0350   Epoch: 8   Global Step: 101530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:26,838-Speed 3069.38 samples/sec   Loss 8.2010   LearningRate 0.0350   Epoch: 8   Global Step: 101540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:30,338-Speed 2926.36 samples/sec   Loss 8.1931   LearningRate 0.0350   Epoch: 8   Global Step: 101550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:33,792-Speed 2965.11 samples/sec   Loss 8.1815   LearningRate 0.0349   Epoch: 8   Global Step: 101560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:37,289-Speed 2929.81 samples/sec   Loss 8.0437   LearningRate 0.0349   Epoch: 8   Global Step: 101570   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:19:40,736-Speed 2971.52 samples/sec   Loss 8.0222   LearningRate 0.0349   Epoch: 8   Global Step: 101580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:44,117-Speed 3029.27 samples/sec   Loss 8.1585   LearningRate 0.0349   Epoch: 8   Global Step: 101590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:47,541-Speed 2991.77 samples/sec   Loss 8.1066   LearningRate 0.0349   Epoch: 8   Global Step: 101600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:51,019-Speed 2945.35 samples/sec   Loss 8.1146   LearningRate 0.0349   Epoch: 8   Global Step: 101610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:54,439-Speed 2994.64 samples/sec   Loss 8.0734   LearningRate 0.0349   Epoch: 8   Global Step: 101620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:19:57,890-Speed 2968.36 samples/sec   Loss 8.1661   LearningRate 0.0349   Epoch: 8   Global Step: 101630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:20:01,304-Speed 3000.45 samples/sec   Loss 7.9966   LearningRate 0.0349   Epoch: 8   Global Step: 101640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:20:04,675-Speed 3038.53 samples/sec   Loss 8.2689   LearningRate 0.0349   Epoch: 8   Global Step: 101650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:20:08,104-Speed 2987.19 samples/sec   Loss 8.2822   LearningRate 0.0349   Epoch: 8   Global Step: 101660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:20:11,481-Speed 3033.12 samples/sec   Loss 8.1602   LearningRate 0.0349   Epoch: 8   Global Step: 101670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:20:14,806-Speed 3081.48 samples/sec   Loss 8.1678   LearningRate 0.0349   Epoch: 8   Global Step: 101680   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:20:18,139-Speed 3072.97 samples/sec   Loss 8.1275   LearningRate 0.0349   Epoch: 8   Global Step: 101690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:20:21,503-Speed 3044.72 samples/sec   Loss 8.1537   LearningRate 0.0349   Epoch: 8   Global Step: 101700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:20:24,846-Speed 3064.03 samples/sec   Loss 8.0904   LearningRate 0.0349   Epoch: 8   Global Step: 101710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:20:28,282-Speed 2980.96 samples/sec   Loss 8.1441   LearningRate 0.0349   Epoch: 8   Global Step: 101720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:20:31,666-Speed 3027.00 samples/sec   Loss 8.2530   LearningRate 0.0349   Epoch: 8   Global Step: 101730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:20:35,090-Speed 2991.36 samples/sec   Loss 8.2235   LearningRate 0.0349   Epoch: 8   Global Step: 101740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:20:38,475-Speed 3026.64 samples/sec   Loss 8.2056   LearningRate 0.0349   Epoch: 8   Global Step: 101750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:20:41,905-Speed 2986.38 samples/sec   Loss 8.1685   LearningRate 0.0349   Epoch: 8   Global Step: 101760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:20:45,307-Speed 3010.18 samples/sec   Loss 8.1277   LearningRate 0.0348   Epoch: 8   Global Step: 101770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:20:48,737-Speed 2987.02 samples/sec   Loss 8.2594   LearningRate 0.0348   Epoch: 8   Global Step: 101780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:20:52,103-Speed 3042.69 samples/sec   Loss 8.3255   LearningRate 0.0348   Epoch: 8   Global Step: 101790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:20:55,460-Speed 3051.06 samples/sec   Loss 8.0921   LearningRate 0.0348   Epoch: 8   Global Step: 101800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:20:58,894-Speed 2982.95 samples/sec   Loss 8.0854   LearningRate 0.0348   Epoch: 8   Global Step: 101810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:21:02,228-Speed 3071.91 samples/sec   Loss 8.2085   LearningRate 0.0348   Epoch: 8   Global Step: 101820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:21:05,585-Speed 3051.31 samples/sec   Loss 8.1825   LearningRate 0.0348   Epoch: 8   Global Step: 101830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:21:08,995-Speed 3003.56 samples/sec   Loss 8.3646   LearningRate 0.0348   Epoch: 8   Global Step: 101840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:21:12,458-Speed 2958.55 samples/sec   Loss 8.2222   LearningRate 0.0348   Epoch: 8   Global Step: 101850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:21:15,830-Speed 3036.87 samples/sec   Loss 8.2577   LearningRate 0.0348   Epoch: 8   Global Step: 101860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:21:19,207-Speed 3033.28 samples/sec   Loss 8.2383   LearningRate 0.0348   Epoch: 8   Global Step: 101870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:21:22,571-Speed 3044.95 samples/sec   Loss 8.2127   LearningRate 0.0348   Epoch: 8   Global Step: 101880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:21:25,901-Speed 3075.78 samples/sec   Loss 8.1512   LearningRate 0.0348   Epoch: 8   Global Step: 101890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:21:29,314-Speed 3001.58 samples/sec   Loss 8.1876   LearningRate 0.0348   Epoch: 8   Global Step: 101900   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:21:32,731-Speed 2997.20 samples/sec   Loss 8.1462   LearningRate 0.0348   Epoch: 8   Global Step: 101910   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:21:36,082-Speed 3057.00 samples/sec   Loss 8.2378   LearningRate 0.0348   Epoch: 8   Global Step: 101920   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:21:39,495-Speed 3001.01 samples/sec   Loss 8.4164   LearningRate 0.0348   Epoch: 8   Global Step: 101930   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:21:42,869-Speed 3036.12 samples/sec   Loss 8.1357   LearningRate 0.0348   Epoch: 8   Global Step: 101940   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:21:46,322-Speed 2967.07 samples/sec   Loss 8.1619   LearningRate 0.0348   Epoch: 8   Global Step: 101950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:21:49,707-Speed 3025.90 samples/sec   Loss 8.1727   LearningRate 0.0348   Epoch: 8   Global Step: 101960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:21:53,084-Speed 3033.20 samples/sec   Loss 8.1749   LearningRate 0.0348   Epoch: 8   Global Step: 101970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:21:56,466-Speed 3028.96 samples/sec   Loss 8.1836   LearningRate 0.0347   Epoch: 8   Global Step: 101980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:21:59,820-Speed 3054.63 samples/sec   Loss 8.1694   LearningRate 0.0347   Epoch: 8   Global Step: 101990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:22:03,236-Speed 2998.52 samples/sec   Loss 8.1769   LearningRate 0.0347   Epoch: 8   Global Step: 102000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:22:06,701-Speed 2957.03 samples/sec   Loss 8.0280   LearningRate 0.0347   Epoch: 8   Global Step: 102010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:22:10,129-Speed 2987.93 samples/sec   Loss 8.1743   LearningRate 0.0347   Epoch: 8   Global Step: 102020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:22:13,491-Speed 3046.58 samples/sec   Loss 8.1273   LearningRate 0.0347   Epoch: 8   Global Step: 102030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:22:16,846-Speed 3053.14 samples/sec   Loss 8.1346   LearningRate 0.0347   Epoch: 8   Global Step: 102040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:22:20,191-Speed 3062.11 samples/sec   Loss 8.3165   LearningRate 0.0347   Epoch: 8   Global Step: 102050   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:22:23,563-Speed 3037.82 samples/sec   Loss 8.2700   LearningRate 0.0347   Epoch: 8   Global Step: 102060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:22:26,931-Speed 3040.86 samples/sec   Loss 8.0485   LearningRate 0.0347   Epoch: 8   Global Step: 102070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:22:30,266-Speed 3071.85 samples/sec   Loss 8.0909   LearningRate 0.0347   Epoch: 8   Global Step: 102080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:22:33,632-Speed 3042.88 samples/sec   Loss 8.2787   LearningRate 0.0347   Epoch: 8   Global Step: 102090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:22:37,007-Speed 3034.98 samples/sec   Loss 8.2680   LearningRate 0.0347   Epoch: 8   Global Step: 102100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:22:40,456-Speed 2970.25 samples/sec   Loss 8.2457   LearningRate 0.0347   Epoch: 8   Global Step: 102110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:22:43,851-Speed 3016.86 samples/sec   Loss 8.2880   LearningRate 0.0347   Epoch: 8   Global Step: 102120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:22:47,188-Speed 3069.44 samples/sec   Loss 8.3259   LearningRate 0.0347   Epoch: 8   Global Step: 102130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:22:50,574-Speed 3024.72 samples/sec   Loss 8.3071   LearningRate 0.0347   Epoch: 8   Global Step: 102140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:22:53,916-Speed 3065.65 samples/sec   Loss 8.2635   LearningRate 0.0347   Epoch: 8   Global Step: 102150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:22:57,285-Speed 3039.59 samples/sec   Loss 8.2422   LearningRate 0.0347   Epoch: 8   Global Step: 102160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:23:00,713-Speed 2988.32 samples/sec   Loss 8.3085   LearningRate 0.0347   Epoch: 8   Global Step: 102170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:23:04,108-Speed 3016.45 samples/sec   Loss 8.2392   LearningRate 0.0347   Epoch: 8   Global Step: 102180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:23:07,471-Speed 3045.95 samples/sec   Loss 8.1586   LearningRate 0.0346   Epoch: 8   Global Step: 102190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:23:10,839-Speed 3041.72 samples/sec   Loss 8.2982   LearningRate 0.0346   Epoch: 8   Global Step: 102200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:23:14,220-Speed 3029.48 samples/sec   Loss 8.1618   LearningRate 0.0346   Epoch: 8   Global Step: 102210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:23:17,552-Speed 3073.82 samples/sec   Loss 8.2488   LearningRate 0.0346   Epoch: 8   Global Step: 102220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:23:20,995-Speed 2975.65 samples/sec   Loss 8.3157   LearningRate 0.0346   Epoch: 8   Global Step: 102230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:23:24,428-Speed 2983.59 samples/sec   Loss 8.2794   LearningRate 0.0346   Epoch: 8   Global Step: 102240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:23:27,853-Speed 2990.32 samples/sec   Loss 8.3177   LearningRate 0.0346   Epoch: 8   Global Step: 102250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:23:31,274-Speed 2993.50 samples/sec   Loss 8.2823   LearningRate 0.0346   Epoch: 8   Global Step: 102260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:23:34,612-Speed 3069.26 samples/sec   Loss 8.2192   LearningRate 0.0346   Epoch: 8   Global Step: 102270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:23:37,950-Speed 3068.10 samples/sec   Loss 8.3882   LearningRate 0.0346   Epoch: 8   Global Step: 102280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:23:41,340-Speed 3021.60 samples/sec   Loss 8.2484   LearningRate 0.0346   Epoch: 8   Global Step: 102290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:23:44,768-Speed 2988.11 samples/sec   Loss 8.2133   LearningRate 0.0346   Epoch: 8   Global Step: 102300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:23:48,165-Speed 3015.50 samples/sec   Loss 8.3357   LearningRate 0.0346   Epoch: 8   Global Step: 102310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:23:51,655-Speed 2934.96 samples/sec   Loss 8.3843   LearningRate 0.0346   Epoch: 8   Global Step: 102320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:23:55,101-Speed 2972.29 samples/sec   Loss 8.4035   LearningRate 0.0346   Epoch: 8   Global Step: 102330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:23:58,547-Speed 2972.38 samples/sec   Loss 8.2882   LearningRate 0.0346   Epoch: 8   Global Step: 102340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:24:01,965-Speed 2996.79 samples/sec   Loss 8.2837   LearningRate 0.0346   Epoch: 8   Global Step: 102350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:24:05,314-Speed 3058.11 samples/sec   Loss 8.2095   LearningRate 0.0346   Epoch: 8   Global Step: 102360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:24:08,779-Speed 2956.60 samples/sec   Loss 8.1861   LearningRate 0.0346   Epoch: 8   Global Step: 102370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:24:12,178-Speed 3013.11 samples/sec   Loss 8.2953   LearningRate 0.0346   Epoch: 8   Global Step: 102380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:24:15,517-Speed 3067.20 samples/sec   Loss 8.2444   LearningRate 0.0346   Epoch: 8   Global Step: 102390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:24:18,905-Speed 3023.62 samples/sec   Loss 8.3037   LearningRate 0.0346   Epoch: 8   Global Step: 102400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:24:22,214-Speed 3095.59 samples/sec   Loss 8.2676   LearningRate 0.0345   Epoch: 8   Global Step: 102410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:24:25,528-Speed 3090.28 samples/sec   Loss 8.2685   LearningRate 0.0345   Epoch: 8   Global Step: 102420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:24:28,867-Speed 3068.81 samples/sec   Loss 8.4220   LearningRate 0.0345   Epoch: 8   Global Step: 102430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:24:32,168-Speed 3102.03 samples/sec   Loss 8.2903   LearningRate 0.0345   Epoch: 8   Global Step: 102440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:24:35,605-Speed 2980.64 samples/sec   Loss 8.3098   LearningRate 0.0345   Epoch: 8   Global Step: 102450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:24:39,016-Speed 3002.86 samples/sec   Loss 8.3123   LearningRate 0.0345   Epoch: 8   Global Step: 102460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:24:42,447-Speed 2985.37 samples/sec   Loss 8.2530   LearningRate 0.0345   Epoch: 8   Global Step: 102470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:24:45,781-Speed 3072.27 samples/sec   Loss 8.2928   LearningRate 0.0345   Epoch: 8   Global Step: 102480   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:24:49,131-Speed 3057.29 samples/sec   Loss 8.2725   LearningRate 0.0345   Epoch: 8   Global Step: 102490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:24:52,558-Speed 2989.36 samples/sec   Loss 8.2933   LearningRate 0.0345   Epoch: 8   Global Step: 102500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:24:56,004-Speed 2972.06 samples/sec   Loss 8.3472   LearningRate 0.0345   Epoch: 8   Global Step: 102510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:24:59,404-Speed 3012.88 samples/sec   Loss 8.3800   LearningRate 0.0345   Epoch: 8   Global Step: 102520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:25:02,783-Speed 3030.79 samples/sec   Loss 8.2165   LearningRate 0.0345   Epoch: 8   Global Step: 102530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:25:06,135-Speed 3056.31 samples/sec   Loss 8.4148   LearningRate 0.0345   Epoch: 8   Global Step: 102540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:25:09,503-Speed 3040.85 samples/sec   Loss 8.2725   LearningRate 0.0345   Epoch: 8   Global Step: 102550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:25:12,825-Speed 3083.65 samples/sec   Loss 8.3369   LearningRate 0.0345   Epoch: 8   Global Step: 102560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:25:16,206-Speed 3028.90 samples/sec   Loss 8.3541   LearningRate 0.0345   Epoch: 8   Global Step: 102570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:25:19,577-Speed 3038.74 samples/sec   Loss 8.4367   LearningRate 0.0345   Epoch: 8   Global Step: 102580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:25:22,920-Speed 3064.75 samples/sec   Loss 8.3992   LearningRate 0.0345   Epoch: 8   Global Step: 102590   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:25:26,213-Speed 3109.94 samples/sec   Loss 8.3213   LearningRate 0.0345   Epoch: 8   Global Step: 102600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:25:29,647-Speed 2982.47 samples/sec   Loss 8.4049   LearningRate 0.0345   Epoch: 8   Global Step: 102610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:25:32,998-Speed 3056.79 samples/sec   Loss 8.3075   LearningRate 0.0344   Epoch: 8   Global Step: 102620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:25:36,376-Speed 3032.11 samples/sec   Loss 8.3383   LearningRate 0.0344   Epoch: 8   Global Step: 102630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:25:39,764-Speed 3022.91 samples/sec   Loss 8.3061   LearningRate 0.0344   Epoch: 8   Global Step: 102640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:25:43,117-Speed 3054.82 samples/sec   Loss 8.3672   LearningRate 0.0344   Epoch: 8   Global Step: 102650   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 11:25:46,561-Speed 2974.22 samples/sec   Loss 8.3413   LearningRate 0.0344   Epoch: 8   Global Step: 102660   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 11:25:50,003-Speed 2975.89 samples/sec   Loss 8.3636   LearningRate 0.0344   Epoch: 8   Global Step: 102670   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 11:25:53,436-Speed 2983.32 samples/sec   Loss 8.3880   LearningRate 0.0344   Epoch: 8   Global Step: 102680   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 11:25:56,815-Speed 3031.85 samples/sec   Loss 8.2390   LearningRate 0.0344   Epoch: 8   Global Step: 102690   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 11:26:00,333-Speed 2911.49 samples/sec   Loss 8.3847   LearningRate 0.0344   Epoch: 8   Global Step: 102700   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 11:26:03,679-Speed 3060.28 samples/sec   Loss 8.3872   LearningRate 0.0344   Epoch: 8   Global Step: 102710   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 11:26:07,061-Speed 3029.25 samples/sec   Loss 8.3923   LearningRate 0.0344   Epoch: 8   Global Step: 102720   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 11:26:10,375-Speed 3090.28 samples/sec   Loss 8.2796   LearningRate 0.0344   Epoch: 8   Global Step: 102730   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 11:26:13,730-Speed 3053.33 samples/sec   Loss 8.2465   LearningRate 0.0344   Epoch: 8   Global Step: 102740   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 11:26:17,091-Speed 3047.69 samples/sec   Loss 8.2492   LearningRate 0.0344   Epoch: 8   Global Step: 102750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:26:20,537-Speed 2972.15 samples/sec   Loss 8.3696   LearningRate 0.0344   Epoch: 8   Global Step: 102760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:26:23,879-Speed 3064.98 samples/sec   Loss 8.2492   LearningRate 0.0344   Epoch: 8   Global Step: 102770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:26:27,300-Speed 2993.70 samples/sec   Loss 8.3883   LearningRate 0.0344   Epoch: 8   Global Step: 102780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:26:30,663-Speed 3045.99 samples/sec   Loss 8.3468   LearningRate 0.0344   Epoch: 8   Global Step: 102790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:26:34,014-Speed 3055.90 samples/sec   Loss 8.2586   LearningRate 0.0344   Epoch: 8   Global Step: 102800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:26:37,402-Speed 3023.76 samples/sec   Loss 8.2845   LearningRate 0.0344   Epoch: 8   Global Step: 102810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:26:40,786-Speed 3026.37 samples/sec   Loss 8.2714   LearningRate 0.0344   Epoch: 8   Global Step: 102820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:26:44,183-Speed 3015.64 samples/sec   Loss 8.2611   LearningRate 0.0343   Epoch: 8   Global Step: 102830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:26:47,564-Speed 3029.81 samples/sec   Loss 8.3605   LearningRate 0.0343   Epoch: 8   Global Step: 102840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:26:50,873-Speed 3095.10 samples/sec   Loss 8.4252   LearningRate 0.0343   Epoch: 8   Global Step: 102850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:26:54,221-Speed 3059.77 samples/sec   Loss 8.3168   LearningRate 0.0343   Epoch: 8   Global Step: 102860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:26:57,590-Speed 3040.20 samples/sec   Loss 8.3159   LearningRate 0.0343   Epoch: 8   Global Step: 102870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:00,947-Speed 3051.80 samples/sec   Loss 8.3022   LearningRate 0.0343   Epoch: 8   Global Step: 102880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:04,368-Speed 2994.08 samples/sec   Loss 8.2961   LearningRate 0.0343   Epoch: 8   Global Step: 102890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:07,731-Speed 3045.91 samples/sec   Loss 8.3447   LearningRate 0.0343   Epoch: 8   Global Step: 102900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:11,149-Speed 2996.43 samples/sec   Loss 8.3945   LearningRate 0.0343   Epoch: 8   Global Step: 102910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:14,519-Speed 3039.26 samples/sec   Loss 8.3415   LearningRate 0.0343   Epoch: 8   Global Step: 102920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:17,953-Speed 2983.06 samples/sec   Loss 8.3377   LearningRate 0.0343   Epoch: 8   Global Step: 102930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:21,341-Speed 3022.93 samples/sec   Loss 8.4296   LearningRate 0.0343   Epoch: 8   Global Step: 102940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:24,772-Speed 2985.56 samples/sec   Loss 8.2287   LearningRate 0.0343   Epoch: 8   Global Step: 102950   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:27:28,133-Speed 3048.35 samples/sec   Loss 8.3038   LearningRate 0.0343   Epoch: 8   Global Step: 102960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:31,577-Speed 2973.78 samples/sec   Loss 8.4366   LearningRate 0.0343   Epoch: 8   Global Step: 102970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:34,908-Speed 3074.77 samples/sec   Loss 8.4450   LearningRate 0.0343   Epoch: 8   Global Step: 102980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:38,247-Speed 3068.94 samples/sec   Loss 8.3484   LearningRate 0.0343   Epoch: 8   Global Step: 102990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:41,550-Speed 3101.07 samples/sec   Loss 8.3170   LearningRate 0.0343   Epoch: 8   Global Step: 103000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:44,968-Speed 2996.42 samples/sec   Loss 8.3047   LearningRate 0.0343   Epoch: 8   Global Step: 103010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:48,356-Speed 3023.39 samples/sec   Loss 8.3631   LearningRate 0.0343   Epoch: 8   Global Step: 103020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:27:51,695-Speed 3067.84 samples/sec   Loss 8.2420   LearningRate 0.0343   Epoch: 8   Global Step: 103030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:27:55,065-Speed 3039.66 samples/sec   Loss 8.1933   LearningRate 0.0342   Epoch: 8   Global Step: 103040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:27:58,434-Speed 3040.16 samples/sec   Loss 8.3451   LearningRate 0.0342   Epoch: 8   Global Step: 103050   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:28:01,780-Speed 3060.92 samples/sec   Loss 8.2958   LearningRate 0.0342   Epoch: 8   Global Step: 103060   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:28:05,123-Speed 3064.20 samples/sec   Loss 8.1775   LearningRate 0.0342   Epoch: 8   Global Step: 103070   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:28:08,461-Speed 3068.42 samples/sec   Loss 8.2809   LearningRate 0.0342   Epoch: 8   Global Step: 103080   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:28:11,789-Speed 3078.95 samples/sec   Loss 8.4287   LearningRate 0.0342   Epoch: 8   Global Step: 103090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:28:15,139-Speed 3057.09 samples/sec   Loss 8.3409   LearningRate 0.0342   Epoch: 8   Global Step: 103100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:28:18,460-Speed 3084.11 samples/sec   Loss 8.3758   LearningRate 0.0342   Epoch: 8   Global Step: 103110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:28:21,896-Speed 2981.18 samples/sec   Loss 8.3588   LearningRate 0.0342   Epoch: 8   Global Step: 103120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:28:25,338-Speed 2976.51 samples/sec   Loss 8.2894   LearningRate 0.0342   Epoch: 8   Global Step: 103130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:28:28,781-Speed 2974.97 samples/sec   Loss 8.3303   LearningRate 0.0342   Epoch: 8   Global Step: 103140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:28:32,096-Speed 3089.57 samples/sec   Loss 8.2796   LearningRate 0.0342   Epoch: 8   Global Step: 103150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:28:35,453-Speed 3051.02 samples/sec   Loss 8.4377   LearningRate 0.0342   Epoch: 8   Global Step: 103160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:28:38,791-Speed 3068.11 samples/sec   Loss 8.5243   LearningRate 0.0342   Epoch: 8   Global Step: 103170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:28:42,176-Speed 3026.25 samples/sec   Loss 8.4169   LearningRate 0.0342   Epoch: 8   Global Step: 103180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:28:45,536-Speed 3048.44 samples/sec   Loss 8.2953   LearningRate 0.0342   Epoch: 8   Global Step: 103190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:28:48,914-Speed 3032.36 samples/sec   Loss 8.3225   LearningRate 0.0342   Epoch: 8   Global Step: 103200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:28:52,289-Speed 3034.73 samples/sec   Loss 8.3509   LearningRate 0.0342   Epoch: 8   Global Step: 103210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:28:55,718-Speed 2986.71 samples/sec   Loss 8.3476   LearningRate 0.0342   Epoch: 8   Global Step: 103220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:28:59,069-Speed 3057.16 samples/sec   Loss 8.4900   LearningRate 0.0342   Epoch: 8   Global Step: 103230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:29:02,543-Speed 2948.23 samples/sec   Loss 8.3501   LearningRate 0.0342   Epoch: 8   Global Step: 103240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:29:05,860-Speed 3087.44 samples/sec   Loss 8.3797   LearningRate 0.0341   Epoch: 8   Global Step: 103250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:29:09,313-Speed 2966.66 samples/sec   Loss 8.3535   LearningRate 0.0341   Epoch: 8   Global Step: 103260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:29:12,696-Speed 3027.42 samples/sec   Loss 8.3077   LearningRate 0.0341   Epoch: 8   Global Step: 103270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:29:16,091-Speed 3017.14 samples/sec   Loss 8.2474   LearningRate 0.0341   Epoch: 8   Global Step: 103280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:29:19,466-Speed 3035.61 samples/sec   Loss 8.4115   LearningRate 0.0341   Epoch: 8   Global Step: 103290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:29:22,786-Speed 3084.90 samples/sec   Loss 8.4038   LearningRate 0.0341   Epoch: 8   Global Step: 103300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:29:26,158-Speed 3036.93 samples/sec   Loss 8.4699   LearningRate 0.0341   Epoch: 8   Global Step: 103310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:29:29,551-Speed 3019.34 samples/sec   Loss 8.2388   LearningRate 0.0341   Epoch: 8   Global Step: 103320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:29:32,879-Speed 3077.70 samples/sec   Loss 8.3126   LearningRate 0.0341   Epoch: 8   Global Step: 103330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:29:36,346-Speed 2954.63 samples/sec   Loss 8.4372   LearningRate 0.0341   Epoch: 8   Global Step: 103340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:29:39,808-Speed 2958.24 samples/sec   Loss 8.2639   LearningRate 0.0341   Epoch: 8   Global Step: 103350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:29:43,229-Speed 2994.32 samples/sec   Loss 8.2713   LearningRate 0.0341   Epoch: 8   Global Step: 103360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:29:46,583-Speed 3053.46 samples/sec   Loss 8.3447   LearningRate 0.0341   Epoch: 8   Global Step: 103370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:29:49,996-Speed 3001.81 samples/sec   Loss 8.4997   LearningRate 0.0341   Epoch: 8   Global Step: 103380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:29:53,412-Speed 2998.23 samples/sec   Loss 8.4370   LearningRate 0.0341   Epoch: 8   Global Step: 103390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:29:56,801-Speed 3022.04 samples/sec   Loss 8.4282   LearningRate 0.0341   Epoch: 8   Global Step: 103400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:30:00,192-Speed 3020.71 samples/sec   Loss 8.3596   LearningRate 0.0341   Epoch: 8   Global Step: 103410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:30:03,564-Speed 3037.76 samples/sec   Loss 8.3298   LearningRate 0.0341   Epoch: 8   Global Step: 103420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:30:06,937-Speed 3036.73 samples/sec   Loss 8.4987   LearningRate 0.0341   Epoch: 8   Global Step: 103430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:30:10,331-Speed 3018.19 samples/sec   Loss 8.3036   LearningRate 0.0341   Epoch: 8   Global Step: 103440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:30:13,705-Speed 3036.14 samples/sec   Loss 8.4741   LearningRate 0.0341   Epoch: 8   Global Step: 103450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:30:17,008-Speed 3100.75 samples/sec   Loss 8.3621   LearningRate 0.0341   Epoch: 8   Global Step: 103460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:30:20,330-Speed 3083.92 samples/sec   Loss 8.3195   LearningRate 0.0340   Epoch: 8   Global Step: 103470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:30:23,696-Speed 3042.82 samples/sec   Loss 8.3024   LearningRate 0.0340   Epoch: 8   Global Step: 103480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:30:27,063-Speed 3041.59 samples/sec   Loss 8.3988   LearningRate 0.0340   Epoch: 8   Global Step: 103490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:30:30,434-Speed 3039.04 samples/sec   Loss 8.5404   LearningRate 0.0340   Epoch: 8   Global Step: 103500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:30:33,807-Speed 3036.77 samples/sec   Loss 8.4133   LearningRate 0.0340   Epoch: 8   Global Step: 103510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:30:37,213-Speed 3007.33 samples/sec   Loss 8.3633   LearningRate 0.0340   Epoch: 8   Global Step: 103520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:30:40,575-Speed 3046.31 samples/sec   Loss 8.3315   LearningRate 0.0340   Epoch: 8   Global Step: 103530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:30:43,940-Speed 3044.14 samples/sec   Loss 8.2543   LearningRate 0.0340   Epoch: 8   Global Step: 103540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:30:47,273-Speed 3073.73 samples/sec   Loss 8.3298   LearningRate 0.0340   Epoch: 8   Global Step: 103550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:30:50,672-Speed 3013.13 samples/sec   Loss 8.3395   LearningRate 0.0340   Epoch: 8   Global Step: 103560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:30:54,080-Speed 3005.52 samples/sec   Loss 8.3581   LearningRate 0.0340   Epoch: 8   Global Step: 103570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:30:57,470-Speed 3021.34 samples/sec   Loss 8.3681   LearningRate 0.0340   Epoch: 8   Global Step: 103580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:31:00,800-Speed 3075.60 samples/sec   Loss 8.1693   LearningRate 0.0340   Epoch: 8   Global Step: 103590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:31:04,114-Speed 3090.98 samples/sec   Loss 8.1778   LearningRate 0.0340   Epoch: 8   Global Step: 103600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:31:07,488-Speed 3035.46 samples/sec   Loss 8.3617   LearningRate 0.0340   Epoch: 8   Global Step: 103610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:31:10,812-Speed 3081.82 samples/sec   Loss 8.2665   LearningRate 0.0340   Epoch: 8   Global Step: 103620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:31:14,152-Speed 3066.89 samples/sec   Loss 8.4255   LearningRate 0.0340   Epoch: 8   Global Step: 103630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:31:17,485-Speed 3072.72 samples/sec   Loss 8.3229   LearningRate 0.0340   Epoch: 8   Global Step: 103640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:31:20,808-Speed 3082.52 samples/sec   Loss 8.4612   LearningRate 0.0340   Epoch: 8   Global Step: 103650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:31:24,171-Speed 3045.85 samples/sec   Loss 8.3678   LearningRate 0.0340   Epoch: 8   Global Step: 103660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:31:27,498-Speed 3078.67 samples/sec   Loss 8.3221   LearningRate 0.0340   Epoch: 8   Global Step: 103670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:31:30,865-Speed 3042.88 samples/sec   Loss 8.4352   LearningRate 0.0339   Epoch: 8   Global Step: 103680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:31:34,258-Speed 3018.49 samples/sec   Loss 8.3112   LearningRate 0.0339   Epoch: 8   Global Step: 103690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:31:37,602-Speed 3062.94 samples/sec   Loss 8.2966   LearningRate 0.0339   Epoch: 8   Global Step: 103700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:31:40,974-Speed 3037.80 samples/sec   Loss 8.3626   LearningRate 0.0339   Epoch: 8   Global Step: 103710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:31:44,314-Speed 3066.38 samples/sec   Loss 8.2993   LearningRate 0.0339   Epoch: 8   Global Step: 103720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:31:47,739-Speed 2990.17 samples/sec   Loss 8.3725   LearningRate 0.0339   Epoch: 8   Global Step: 103730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:31:51,138-Speed 3013.17 samples/sec   Loss 8.2950   LearningRate 0.0339   Epoch: 8   Global Step: 103740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:31:54,461-Speed 3082.90 samples/sec   Loss 8.4583   LearningRate 0.0339   Epoch: 8   Global Step: 103750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:31:57,814-Speed 3055.13 samples/sec   Loss 8.3601   LearningRate 0.0339   Epoch: 8   Global Step: 103760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:01,273-Speed 2961.06 samples/sec   Loss 8.3889   LearningRate 0.0339   Epoch: 8   Global Step: 103770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:04,714-Speed 2976.78 samples/sec   Loss 8.3642   LearningRate 0.0339   Epoch: 8   Global Step: 103780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:08,074-Speed 3048.67 samples/sec   Loss 8.3393   LearningRate 0.0339   Epoch: 8   Global Step: 103790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:11,450-Speed 3033.63 samples/sec   Loss 8.4609   LearningRate 0.0339   Epoch: 8   Global Step: 103800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:14,813-Speed 3045.98 samples/sec   Loss 8.4282   LearningRate 0.0339   Epoch: 8   Global Step: 103810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:18,214-Speed 3012.61 samples/sec   Loss 8.4043   LearningRate 0.0339   Epoch: 8   Global Step: 103820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:21,525-Speed 3093.80 samples/sec   Loss 8.3497   LearningRate 0.0339   Epoch: 8   Global Step: 103830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:24,859-Speed 3072.25 samples/sec   Loss 8.4127   LearningRate 0.0339   Epoch: 8   Global Step: 103840   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:32:28,251-Speed 3019.34 samples/sec   Loss 8.4132   LearningRate 0.0339   Epoch: 8   Global Step: 103850   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:32:31,731-Speed 2943.30 samples/sec   Loss 8.3773   LearningRate 0.0339   Epoch: 8   Global Step: 103860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:35,228-Speed 2929.49 samples/sec   Loss 8.2965   LearningRate 0.0339   Epoch: 8   Global Step: 103870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:38,636-Speed 3005.22 samples/sec   Loss 8.4393   LearningRate 0.0339   Epoch: 8   Global Step: 103880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:42,047-Speed 3003.48 samples/sec   Loss 8.3729   LearningRate 0.0338   Epoch: 8   Global Step: 103890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:45,474-Speed 2988.24 samples/sec   Loss 8.3211   LearningRate 0.0338   Epoch: 8   Global Step: 103900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:48,884-Speed 3003.93 samples/sec   Loss 8.2000   LearningRate 0.0338   Epoch: 8   Global Step: 103910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:52,322-Speed 2979.62 samples/sec   Loss 8.3035   LearningRate 0.0338   Epoch: 8   Global Step: 103920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:55,729-Speed 3006.38 samples/sec   Loss 8.3976   LearningRate 0.0338   Epoch: 8   Global Step: 103930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:32:59,132-Speed 3009.51 samples/sec   Loss 8.3710   LearningRate 0.0338   Epoch: 8   Global Step: 103940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:02,586-Speed 2965.64 samples/sec   Loss 8.3440   LearningRate 0.0338   Epoch: 8   Global Step: 103950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:05,981-Speed 3017.54 samples/sec   Loss 8.4830   LearningRate 0.0338   Epoch: 8   Global Step: 103960   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:33:09,334-Speed 3053.88 samples/sec   Loss 8.3957   LearningRate 0.0338   Epoch: 8   Global Step: 103970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:12,697-Speed 3046.14 samples/sec   Loss 8.4019   LearningRate 0.0338   Epoch: 8   Global Step: 103980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:16,094-Speed 3015.36 samples/sec   Loss 8.4672   LearningRate 0.0338   Epoch: 8   Global Step: 103990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:19,404-Speed 3094.04 samples/sec   Loss 8.3929   LearningRate 0.0338   Epoch: 8   Global Step: 104000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:22,806-Speed 3011.47 samples/sec   Loss 8.3760   LearningRate 0.0338   Epoch: 8   Global Step: 104010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:26,213-Speed 3005.83 samples/sec   Loss 8.4065   LearningRate 0.0338   Epoch: 8   Global Step: 104020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:29,610-Speed 3015.17 samples/sec   Loss 8.3463   LearningRate 0.0338   Epoch: 8   Global Step: 104030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:32,951-Speed 3065.97 samples/sec   Loss 8.2945   LearningRate 0.0338   Epoch: 8   Global Step: 104040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:36,388-Speed 2980.03 samples/sec   Loss 8.4185   LearningRate 0.0338   Epoch: 8   Global Step: 104050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:39,750-Speed 3048.47 samples/sec   Loss 8.4957   LearningRate 0.0338   Epoch: 8   Global Step: 104060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:43,138-Speed 3022.85 samples/sec   Loss 8.3054   LearningRate 0.0338   Epoch: 8   Global Step: 104070   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:33:46,513-Speed 3034.55 samples/sec   Loss 8.3909   LearningRate 0.0338   Epoch: 8   Global Step: 104080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:49,907-Speed 3018.38 samples/sec   Loss 8.3783   LearningRate 0.0338   Epoch: 8   Global Step: 104090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:53,308-Speed 3011.79 samples/sec   Loss 8.4518   LearningRate 0.0338   Epoch: 8   Global Step: 104100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:33:56,696-Speed 3023.15 samples/sec   Loss 8.3408   LearningRate 0.0337   Epoch: 8   Global Step: 104110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:34:00,201-Speed 2923.16 samples/sec   Loss 8.3400   LearningRate 0.0337   Epoch: 8   Global Step: 104120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:34:03,549-Speed 3058.94 samples/sec   Loss 8.3222   LearningRate 0.0337   Epoch: 8   Global Step: 104130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:34:06,882-Speed 3073.60 samples/sec   Loss 8.3237   LearningRate 0.0337   Epoch: 8   Global Step: 104140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:34:10,286-Speed 3008.74 samples/sec   Loss 8.3180   LearningRate 0.0337   Epoch: 8   Global Step: 104150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:34:13,717-Speed 2985.21 samples/sec   Loss 8.4130   LearningRate 0.0337   Epoch: 8   Global Step: 104160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:34:17,112-Speed 3017.66 samples/sec   Loss 8.2950   LearningRate 0.0337   Epoch: 8   Global Step: 104170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:34:20,518-Speed 3007.40 samples/sec   Loss 8.4364   LearningRate 0.0337   Epoch: 8   Global Step: 104180   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:34:23,878-Speed 3047.78 samples/sec   Loss 8.3063   LearningRate 0.0337   Epoch: 8   Global Step: 104190   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:34:27,319-Speed 2976.87 samples/sec   Loss 8.2708   LearningRate 0.0337   Epoch: 8   Global Step: 104200   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:34:30,699-Speed 3030.32 samples/sec   Loss 8.3318   LearningRate 0.0337   Epoch: 8   Global Step: 104210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:34:34,181-Speed 2942.48 samples/sec   Loss 8.1946   LearningRate 0.0337   Epoch: 8   Global Step: 104220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:34:37,542-Speed 3047.32 samples/sec   Loss 8.4347   LearningRate 0.0337   Epoch: 8   Global Step: 104230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:34:40,881-Speed 3067.92 samples/sec   Loss 8.3593   LearningRate 0.0337   Epoch: 8   Global Step: 104240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:34:44,270-Speed 3022.41 samples/sec   Loss 8.3029   LearningRate 0.0337   Epoch: 8   Global Step: 104250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:34:47,615-Speed 3062.03 samples/sec   Loss 8.3705   LearningRate 0.0337   Epoch: 8   Global Step: 104260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:34:51,013-Speed 3014.24 samples/sec   Loss 8.3995   LearningRate 0.0337   Epoch: 8   Global Step: 104270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:34:54,421-Speed 3005.40 samples/sec   Loss 8.3593   LearningRate 0.0337   Epoch: 8   Global Step: 104280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:34:57,836-Speed 2998.79 samples/sec   Loss 8.3432   LearningRate 0.0337   Epoch: 8   Global Step: 104290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:35:01,298-Speed 2959.16 samples/sec   Loss 8.3815   LearningRate 0.0337   Epoch: 8   Global Step: 104300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:35:04,654-Speed 3051.72 samples/sec   Loss 8.4471   LearningRate 0.0337   Epoch: 8   Global Step: 104310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:35:08,024-Speed 3039.58 samples/sec   Loss 8.4124   LearningRate 0.0336   Epoch: 8   Global Step: 104320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:35:11,364-Speed 3066.86 samples/sec   Loss 8.3669   LearningRate 0.0336   Epoch: 8   Global Step: 104330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:35:14,681-Speed 3087.53 samples/sec   Loss 8.3089   LearningRate 0.0336   Epoch: 8   Global Step: 104340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:35:18,072-Speed 3020.95 samples/sec   Loss 8.3952   LearningRate 0.0336   Epoch: 8   Global Step: 104350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:35:21,476-Speed 3008.92 samples/sec   Loss 8.4204   LearningRate 0.0336   Epoch: 8   Global Step: 104360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:35:24,893-Speed 2997.84 samples/sec   Loss 8.5085   LearningRate 0.0336   Epoch: 8   Global Step: 104370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:35:28,325-Speed 2985.17 samples/sec   Loss 8.3412   LearningRate 0.0336   Epoch: 8   Global Step: 104380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:35:31,756-Speed 2985.02 samples/sec   Loss 8.3133   LearningRate 0.0336   Epoch: 8   Global Step: 104390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:35:35,124-Speed 3040.96 samples/sec   Loss 8.4328   LearningRate 0.0336   Epoch: 8   Global Step: 104400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:35:38,496-Speed 3037.77 samples/sec   Loss 8.3808   LearningRate 0.0336   Epoch: 8   Global Step: 104410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:35:41,898-Speed 3011.66 samples/sec   Loss 8.3856   LearningRate 0.0336   Epoch: 8   Global Step: 104420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:35:45,263-Speed 3043.97 samples/sec   Loss 8.4259   LearningRate 0.0336   Epoch: 8   Global Step: 104430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:35:48,667-Speed 3008.77 samples/sec   Loss 8.3449   LearningRate 0.0336   Epoch: 8   Global Step: 104440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:35:52,048-Speed 3029.86 samples/sec   Loss 8.4418   LearningRate 0.0336   Epoch: 8   Global Step: 104450   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:35:55,478-Speed 2985.90 samples/sec   Loss 8.2689   LearningRate 0.0336   Epoch: 8   Global Step: 104460   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:35:58,835-Speed 3050.74 samples/sec   Loss 8.4356   LearningRate 0.0336   Epoch: 8   Global Step: 104470   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:36:02,145-Speed 3094.67 samples/sec   Loss 8.3974   LearningRate 0.0336   Epoch: 8   Global Step: 104480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:05,584-Speed 2979.04 samples/sec   Loss 8.3510   LearningRate 0.0336   Epoch: 8   Global Step: 104490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:08,980-Speed 3015.67 samples/sec   Loss 8.4521   LearningRate 0.0336   Epoch: 8   Global Step: 104500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:12,347-Speed 3042.83 samples/sec   Loss 8.3690   LearningRate 0.0336   Epoch: 8   Global Step: 104510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:15,739-Speed 3018.82 samples/sec   Loss 8.3584   LearningRate 0.0336   Epoch: 8   Global Step: 104520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:19,096-Speed 3051.68 samples/sec   Loss 8.3761   LearningRate 0.0335   Epoch: 8   Global Step: 104530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:22,505-Speed 3005.03 samples/sec   Loss 8.3258   LearningRate 0.0335   Epoch: 8   Global Step: 104540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:25,995-Speed 2934.56 samples/sec   Loss 8.4083   LearningRate 0.0335   Epoch: 8   Global Step: 104550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:29,429-Speed 2982.66 samples/sec   Loss 8.4574   LearningRate 0.0335   Epoch: 8   Global Step: 104560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:32,892-Speed 2958.40 samples/sec   Loss 8.3775   LearningRate 0.0335   Epoch: 8   Global Step: 104570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:36,325-Speed 2983.71 samples/sec   Loss 8.3405   LearningRate 0.0335   Epoch: 8   Global Step: 104580   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 11:36:39,793-Speed 2953.94 samples/sec   Loss 8.4439   LearningRate 0.0335   Epoch: 8   Global Step: 104590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:43,246-Speed 2965.70 samples/sec   Loss 8.3292   LearningRate 0.0335   Epoch: 8   Global Step: 104600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:46,618-Speed 3037.59 samples/sec   Loss 8.2283   LearningRate 0.0335   Epoch: 8   Global Step: 104610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:50,075-Speed 2963.75 samples/sec   Loss 8.2956   LearningRate 0.0335   Epoch: 8   Global Step: 104620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:53,561-Speed 2937.92 samples/sec   Loss 8.3085   LearningRate 0.0335   Epoch: 8   Global Step: 104630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:36:56,999-Speed 2979.45 samples/sec   Loss 8.3731   LearningRate 0.0335   Epoch: 8   Global Step: 104640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:37:00,394-Speed 3016.98 samples/sec   Loss 8.4992   LearningRate 0.0335   Epoch: 8   Global Step: 104650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:37:03,733-Speed 3068.04 samples/sec   Loss 8.3397   LearningRate 0.0335   Epoch: 8   Global Step: 104660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:37:07,051-Speed 3086.69 samples/sec   Loss 8.3424   LearningRate 0.0335   Epoch: 8   Global Step: 104670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:37:10,435-Speed 3027.12 samples/sec   Loss 8.2725   LearningRate 0.0335   Epoch: 8   Global Step: 104680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:37:13,902-Speed 2954.58 samples/sec   Loss 8.4453   LearningRate 0.0335   Epoch: 8   Global Step: 104690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:37:17,283-Speed 3029.39 samples/sec   Loss 8.4584   LearningRate 0.0335   Epoch: 8   Global Step: 104700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:37:20,683-Speed 3012.38 samples/sec   Loss 8.3908   LearningRate 0.0335   Epoch: 8   Global Step: 104710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:37:24,029-Speed 3061.62 samples/sec   Loss 8.4931   LearningRate 0.0335   Epoch: 8   Global Step: 104720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:37:27,425-Speed 3016.10 samples/sec   Loss 8.3973   LearningRate 0.0335   Epoch: 8   Global Step: 104730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:37:30,853-Speed 2987.68 samples/sec   Loss 8.4040   LearningRate 0.0335   Epoch: 8   Global Step: 104740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:37:34,307-Speed 2965.28 samples/sec   Loss 8.4401   LearningRate 0.0334   Epoch: 8   Global Step: 104750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:37:37,743-Speed 2981.09 samples/sec   Loss 8.3098   LearningRate 0.0334   Epoch: 8   Global Step: 104760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:37:41,149-Speed 3007.37 samples/sec   Loss 8.3909   LearningRate 0.0334   Epoch: 8   Global Step: 104770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:37:44,585-Speed 2981.74 samples/sec   Loss 8.5689   LearningRate 0.0334   Epoch: 8   Global Step: 104780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:37:48,052-Speed 2954.60 samples/sec   Loss 8.4053   LearningRate 0.0334   Epoch: 8   Global Step: 104790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:37:51,487-Speed 2981.58 samples/sec   Loss 8.2511   LearningRate 0.0334   Epoch: 8   Global Step: 104800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:37:54,844-Speed 3051.05 samples/sec   Loss 8.2185   LearningRate 0.0334   Epoch: 8   Global Step: 104810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:37:58,254-Speed 3003.54 samples/sec   Loss 8.3651   LearningRate 0.0334   Epoch: 8   Global Step: 104820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:38:01,640-Speed 3025.42 samples/sec   Loss 8.3026   LearningRate 0.0334   Epoch: 8   Global Step: 104830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:38:04,977-Speed 3069.01 samples/sec   Loss 8.4076   LearningRate 0.0334   Epoch: 8   Global Step: 104840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:38:08,319-Speed 3065.37 samples/sec   Loss 8.4816   LearningRate 0.0334   Epoch: 8   Global Step: 104850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 11:38:11,754-Speed 2981.94 samples/sec   Loss 8.5012   LearningRate 0.0334   Epoch: 8   Global Step: 104860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:38:15,053-Speed 3104.85 samples/sec   Loss 8.3228   LearningRate 0.0334   Epoch: 8   Global Step: 104870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:38:18,955-Speed 2625.25 samples/sec   Loss 8.3778   LearningRate 0.0334   Epoch: 8   Global Step: 104880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:38:22,412-Speed 2962.36 samples/sec   Loss 8.4669   LearningRate 0.0334   Epoch: 8   Global Step: 104890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:38:25,786-Speed 3037.13 samples/sec   Loss 8.3416   LearningRate 0.0334   Epoch: 8   Global Step: 104900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:38:30,439-Speed 2201.73 samples/sec   Loss 8.4113   LearningRate 0.0334   Epoch: 8   Global Step: 104910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:38:34,539-Speed 2497.67 samples/sec   Loss 8.4957   LearningRate 0.0334   Epoch: 8   Global Step: 104920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:38:38,023-Speed 2940.12 samples/sec   Loss 8.3436   LearningRate 0.0334   Epoch: 8   Global Step: 104930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:38:41,400-Speed 3033.09 samples/sec   Loss 8.4082   LearningRate 0.0334   Epoch: 8   Global Step: 104940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:38:44,785-Speed 3026.28 samples/sec   Loss 8.3296   LearningRate 0.0334   Epoch: 8   Global Step: 104950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:38:48,288-Speed 2923.89 samples/sec   Loss 8.1805   LearningRate 0.0333   Epoch: 8   Global Step: 104960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:38:51,698-Speed 3004.25 samples/sec   Loss 8.3375   LearningRate 0.0333   Epoch: 8   Global Step: 104970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:38:55,068-Speed 3039.46 samples/sec   Loss 8.3310   LearningRate 0.0333   Epoch: 8   Global Step: 104980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:38:58,462-Speed 3017.41 samples/sec   Loss 8.3095   LearningRate 0.0333   Epoch: 8   Global Step: 104990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:39:01,821-Speed 3050.19 samples/sec   Loss 8.5139   LearningRate 0.0333   Epoch: 8   Global Step: 105000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 11:39:05,161-Speed 3066.30 samples/sec   Loss 8.4247   LearningRate 0.0333   Epoch: 8   Global Step: 105010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:39:08,540-Speed 3031.73 samples/sec   Loss 8.4797   LearningRate 0.0333   Epoch: 8   Global Step: 105020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:39:11,905-Speed 3044.01 samples/sec   Loss 8.2274   LearningRate 0.0333   Epoch: 8   Global Step: 105030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:39:15,280-Speed 3035.03 samples/sec   Loss 8.3400   LearningRate 0.0333   Epoch: 8   Global Step: 105040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:39:18,589-Speed 3097.86 samples/sec   Loss 8.3813   LearningRate 0.0333   Epoch: 8   Global Step: 105050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:39:21,943-Speed 3054.09 samples/sec   Loss 8.2386   LearningRate 0.0333   Epoch: 8   Global Step: 105060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:39:25,251-Speed 3096.88 samples/sec   Loss 8.3594   LearningRate 0.0333   Epoch: 8   Global Step: 105070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:39:28,605-Speed 3053.61 samples/sec   Loss 8.4718   LearningRate 0.0333   Epoch: 8   Global Step: 105080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:39:31,962-Speed 3051.15 samples/sec   Loss 8.3518   LearningRate 0.0333   Epoch: 8   Global Step: 105090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:39:35,383-Speed 2994.56 samples/sec   Loss 8.2172   LearningRate 0.0333   Epoch: 8   Global Step: 105100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:39:38,768-Speed 3025.38 samples/sec   Loss 8.3437   LearningRate 0.0333   Epoch: 8   Global Step: 105110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:39:42,209-Speed 2977.34 samples/sec   Loss 8.5053   LearningRate 0.0333   Epoch: 8   Global Step: 105120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:39:45,577-Speed 3040.83 samples/sec   Loss 8.4245   LearningRate 0.0333   Epoch: 8   Global Step: 105130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:39:48,965-Speed 3023.31 samples/sec   Loss 8.4332   LearningRate 0.0333   Epoch: 8   Global Step: 105140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:39:52,383-Speed 2996.99 samples/sec   Loss 8.2948   LearningRate 0.0333   Epoch: 8   Global Step: 105150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:39:55,713-Speed 3075.34 samples/sec   Loss 8.1726   LearningRate 0.0333   Epoch: 8   Global Step: 105160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:39:59,118-Speed 3009.67 samples/sec   Loss 8.4554   LearningRate 0.0333   Epoch: 8   Global Step: 105170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:40:02,539-Speed 2993.76 samples/sec   Loss 8.4310   LearningRate 0.0332   Epoch: 8   Global Step: 105180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:40:05,837-Speed 3105.61 samples/sec   Loss 8.3679   LearningRate 0.0332   Epoch: 8   Global Step: 105190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:40:09,196-Speed 3050.04 samples/sec   Loss 8.4255   LearningRate 0.0332   Epoch: 8   Global Step: 105200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:40:12,619-Speed 2992.59 samples/sec   Loss 8.3949   LearningRate 0.0332   Epoch: 8   Global Step: 105210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:40:16,071-Speed 2966.79 samples/sec   Loss 8.3291   LearningRate 0.0332   Epoch: 8   Global Step: 105220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:40:19,515-Speed 2974.47 samples/sec   Loss 8.2949   LearningRate 0.0332   Epoch: 8   Global Step: 105230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:40:22,859-Speed 3062.62 samples/sec   Loss 8.4525   LearningRate 0.0332   Epoch: 8   Global Step: 105240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:40:26,162-Speed 3101.55 samples/sec   Loss 8.3903   LearningRate 0.0332   Epoch: 8   Global Step: 105250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:40:29,585-Speed 2992.59 samples/sec   Loss 8.3047   LearningRate 0.0332   Epoch: 8   Global Step: 105260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:40:34,175-Speed 2231.33 samples/sec   Loss 8.3342   LearningRate 0.0332   Epoch: 8   Global Step: 105270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:40:37,506-Speed 3075.50 samples/sec   Loss 8.3083   LearningRate 0.0332   Epoch: 8   Global Step: 105280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:40:40,816-Speed 3094.36 samples/sec   Loss 8.3567   LearningRate 0.0332   Epoch: 8   Global Step: 105290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:40:44,266-Speed 2969.20 samples/sec   Loss 8.3041   LearningRate 0.0332   Epoch: 8   Global Step: 105300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:40:47,622-Speed 3051.70 samples/sec   Loss 8.4367   LearningRate 0.0332   Epoch: 8   Global Step: 105310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:40:50,980-Speed 3050.60 samples/sec   Loss 8.3533   LearningRate 0.0332   Epoch: 8   Global Step: 105320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:40:54,438-Speed 2962.82 samples/sec   Loss 8.4140   LearningRate 0.0332   Epoch: 8   Global Step: 105330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:40:57,860-Speed 2993.22 samples/sec   Loss 8.4350   LearningRate 0.0332   Epoch: 8   Global Step: 105340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:01,179-Speed 3085.30 samples/sec   Loss 8.2361   LearningRate 0.0332   Epoch: 8   Global Step: 105350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:04,524-Speed 3062.92 samples/sec   Loss 8.4529   LearningRate 0.0332   Epoch: 8   Global Step: 105360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:07,877-Speed 3054.74 samples/sec   Loss 8.4754   LearningRate 0.0332   Epoch: 8   Global Step: 105370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:11,259-Speed 3028.64 samples/sec   Loss 8.2687   LearningRate 0.0332   Epoch: 8   Global Step: 105380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:14,581-Speed 3083.14 samples/sec   Loss 8.4799   LearningRate 0.0331   Epoch: 8   Global Step: 105390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:17,934-Speed 3055.09 samples/sec   Loss 8.3817   LearningRate 0.0331   Epoch: 8   Global Step: 105400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:21,320-Speed 3024.97 samples/sec   Loss 8.3879   LearningRate 0.0331   Epoch: 8   Global Step: 105410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:24,742-Speed 2993.15 samples/sec   Loss 8.3133   LearningRate 0.0331   Epoch: 8   Global Step: 105420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:28,109-Speed 3041.83 samples/sec   Loss 8.4969   LearningRate 0.0331   Epoch: 8   Global Step: 105430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:31,507-Speed 3014.70 samples/sec   Loss 8.4011   LearningRate 0.0331   Epoch: 8   Global Step: 105440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:34,907-Speed 3012.45 samples/sec   Loss 8.4398   LearningRate 0.0331   Epoch: 8   Global Step: 105450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:38,239-Speed 3074.85 samples/sec   Loss 8.3299   LearningRate 0.0331   Epoch: 8   Global Step: 105460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:41,651-Speed 3001.94 samples/sec   Loss 8.3589   LearningRate 0.0331   Epoch: 8   Global Step: 105470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:45,071-Speed 2995.33 samples/sec   Loss 8.5656   LearningRate 0.0331   Epoch: 8   Global Step: 105480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:41:48,432-Speed 3047.53 samples/sec   Loss 8.3936   LearningRate 0.0331   Epoch: 8   Global Step: 105490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:41:51,820-Speed 3022.57 samples/sec   Loss 8.4175   LearningRate 0.0331   Epoch: 8   Global Step: 105500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:41:55,172-Speed 3056.57 samples/sec   Loss 8.3557   LearningRate 0.0331   Epoch: 8   Global Step: 105510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:41:58,502-Speed 3075.32 samples/sec   Loss 8.3498   LearningRate 0.0331   Epoch: 8   Global Step: 105520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:42:01,858-Speed 3052.57 samples/sec   Loss 8.2326   LearningRate 0.0331   Epoch: 8   Global Step: 105530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:42:05,226-Speed 3041.12 samples/sec   Loss 8.2428   LearningRate 0.0331   Epoch: 8   Global Step: 105540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:42:08,565-Speed 3067.49 samples/sec   Loss 8.2626   LearningRate 0.0331   Epoch: 8   Global Step: 105550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:42:11,955-Speed 3021.24 samples/sec   Loss 8.3782   LearningRate 0.0331   Epoch: 8   Global Step: 105560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:42:15,399-Speed 2974.85 samples/sec   Loss 8.4923   LearningRate 0.0331   Epoch: 8   Global Step: 105570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:42:18,884-Speed 2938.71 samples/sec   Loss 8.3532   LearningRate 0.0331   Epoch: 8   Global Step: 105580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:42:22,416-Speed 2900.23 samples/sec   Loss 8.2302   LearningRate 0.0331   Epoch: 8   Global Step: 105590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:42:25,810-Speed 3017.63 samples/sec   Loss 8.4824   LearningRate 0.0331   Epoch: 8   Global Step: 105600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:42:29,287-Speed 2946.18 samples/sec   Loss 8.4594   LearningRate 0.0330   Epoch: 8   Global Step: 105610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:42:32,705-Speed 2996.90 samples/sec   Loss 8.3068   LearningRate 0.0330   Epoch: 8   Global Step: 105620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:42:36,094-Speed 3022.46 samples/sec   Loss 8.2995   LearningRate 0.0330   Epoch: 8   Global Step: 105630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:42:39,553-Speed 2961.16 samples/sec   Loss 8.2940   LearningRate 0.0330   Epoch: 8   Global Step: 105640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:42:42,899-Speed 3061.40 samples/sec   Loss 8.3750   LearningRate 0.0330   Epoch: 8   Global Step: 105650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:42:46,382-Speed 2940.89 samples/sec   Loss 8.3362   LearningRate 0.0330   Epoch: 8   Global Step: 105660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:42:49,756-Speed 3035.98 samples/sec   Loss 8.4181   LearningRate 0.0330   Epoch: 8   Global Step: 105670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:42:53,084-Speed 3077.88 samples/sec   Loss 8.4611   LearningRate 0.0330   Epoch: 8   Global Step: 105680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:42:56,419-Speed 3071.22 samples/sec   Loss 8.4204   LearningRate 0.0330   Epoch: 8   Global Step: 105690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:42:59,840-Speed 2994.65 samples/sec   Loss 8.4863   LearningRate 0.0330   Epoch: 8   Global Step: 105700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:43:03,243-Speed 3009.71 samples/sec   Loss 8.2893   LearningRate 0.0330   Epoch: 8   Global Step: 105710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:43:06,713-Speed 2951.93 samples/sec   Loss 8.3029   LearningRate 0.0330   Epoch: 8   Global Step: 105720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:43:10,137-Speed 2991.16 samples/sec   Loss 8.4602   LearningRate 0.0330   Epoch: 8   Global Step: 105730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:43:13,505-Speed 3041.92 samples/sec   Loss 8.4796   LearningRate 0.0330   Epoch: 8   Global Step: 105740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:43:16,902-Speed 3014.28 samples/sec   Loss 8.5562   LearningRate 0.0330   Epoch: 8   Global Step: 105750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:43:20,275-Speed 3037.22 samples/sec   Loss 8.3307   LearningRate 0.0330   Epoch: 8   Global Step: 105760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:43:23,645-Speed 3040.03 samples/sec   Loss 8.3713   LearningRate 0.0330   Epoch: 8   Global Step: 105770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:43:27,084-Speed 2978.54 samples/sec   Loss 8.4637   LearningRate 0.0330   Epoch: 8   Global Step: 105780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:43:30,488-Speed 3009.17 samples/sec   Loss 8.3265   LearningRate 0.0330   Epoch: 8   Global Step: 105790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:43:33,918-Speed 2986.06 samples/sec   Loss 8.3541   LearningRate 0.0330   Epoch: 8   Global Step: 105800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:43:37,353-Speed 2982.43 samples/sec   Loss 8.3154   LearningRate 0.0330   Epoch: 8   Global Step: 105810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:43:40,715-Speed 3045.99 samples/sec   Loss 8.3044   LearningRate 0.0330   Epoch: 8   Global Step: 105820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:43:44,118-Speed 3010.04 samples/sec   Loss 8.3685   LearningRate 0.0329   Epoch: 8   Global Step: 105830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:43:47,501-Speed 3027.44 samples/sec   Loss 8.4660   LearningRate 0.0329   Epoch: 8   Global Step: 105840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:43:50,912-Speed 3002.92 samples/sec   Loss 8.4000   LearningRate 0.0329   Epoch: 8   Global Step: 105850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:43:54,368-Speed 2963.85 samples/sec   Loss 8.3763   LearningRate 0.0329   Epoch: 8   Global Step: 105860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:43:57,793-Speed 2990.98 samples/sec   Loss 8.4040   LearningRate 0.0329   Epoch: 8   Global Step: 105870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:44:01,185-Speed 3019.48 samples/sec   Loss 8.5079   LearningRate 0.0329   Epoch: 8   Global Step: 105880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:44:04,502-Speed 3087.51 samples/sec   Loss 8.4770   LearningRate 0.0329   Epoch: 8   Global Step: 105890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:44:07,856-Speed 3054.89 samples/sec   Loss 8.3890   LearningRate 0.0329   Epoch: 8   Global Step: 105900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:44:11,266-Speed 3004.03 samples/sec   Loss 8.4746   LearningRate 0.0329   Epoch: 8   Global Step: 105910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:44:14,640-Speed 3036.00 samples/sec   Loss 8.4222   LearningRate 0.0329   Epoch: 8   Global Step: 105920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:44:18,037-Speed 3014.56 samples/sec   Loss 8.4480   LearningRate 0.0329   Epoch: 8   Global Step: 105930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:44:21,398-Speed 3047.80 samples/sec   Loss 8.3017   LearningRate 0.0329   Epoch: 8   Global Step: 105940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:44:24,767-Speed 3040.00 samples/sec   Loss 8.3152   LearningRate 0.0329   Epoch: 8   Global Step: 105950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:44:28,211-Speed 2974.27 samples/sec   Loss 8.3072   LearningRate 0.0329   Epoch: 8   Global Step: 105960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:44:31,669-Speed 2961.94 samples/sec   Loss 8.4262   LearningRate 0.0329   Epoch: 8   Global Step: 105970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:44:35,099-Speed 2986.75 samples/sec   Loss 8.4315   LearningRate 0.0329   Epoch: 8   Global Step: 105980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:44:38,519-Speed 2994.75 samples/sec   Loss 8.4790   LearningRate 0.0329   Epoch: 8   Global Step: 105990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:44:41,909-Speed 3021.11 samples/sec   Loss 8.2829   LearningRate 0.0329   Epoch: 8   Global Step: 106000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:44:45,237-Speed 3078.19 samples/sec   Loss 8.3115   LearningRate 0.0329   Epoch: 8   Global Step: 106010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:44:48,601-Speed 3045.29 samples/sec   Loss 8.2486   LearningRate 0.0329   Epoch: 8   Global Step: 106020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:44:51,959-Speed 3050.16 samples/sec   Loss 8.4155   LearningRate 0.0329   Epoch: 8   Global Step: 106030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:44:55,323-Speed 3045.10 samples/sec   Loss 8.2517   LearningRate 0.0328   Epoch: 8   Global Step: 106040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:44:58,657-Speed 3072.61 samples/sec   Loss 8.3458   LearningRate 0.0328   Epoch: 8   Global Step: 106050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:45:02,132-Speed 2947.10 samples/sec   Loss 8.3112   LearningRate 0.0328   Epoch: 8   Global Step: 106060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:45:05,492-Speed 3048.50 samples/sec   Loss 8.3836   LearningRate 0.0328   Epoch: 8   Global Step: 106070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:45:08,837-Speed 3062.31 samples/sec   Loss 8.4290   LearningRate 0.0328   Epoch: 8   Global Step: 106080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:45:12,171-Speed 3072.46 samples/sec   Loss 8.5578   LearningRate 0.0328   Epoch: 8   Global Step: 106090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:45:15,561-Speed 3021.07 samples/sec   Loss 8.4265   LearningRate 0.0328   Epoch: 8   Global Step: 106100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:45:18,966-Speed 3008.61 samples/sec   Loss 8.3535   LearningRate 0.0328   Epoch: 8   Global Step: 106110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:45:22,325-Speed 3049.60 samples/sec   Loss 8.5030   LearningRate 0.0328   Epoch: 8   Global Step: 106120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:45:25,722-Speed 3015.22 samples/sec   Loss 8.3383   LearningRate 0.0328   Epoch: 8   Global Step: 106130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:45:29,144-Speed 2992.96 samples/sec   Loss 8.3041   LearningRate 0.0328   Epoch: 8   Global Step: 106140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:45:32,515-Speed 3038.82 samples/sec   Loss 8.4235   LearningRate 0.0328   Epoch: 8   Global Step: 106150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:45:35,885-Speed 3039.23 samples/sec   Loss 8.2727   LearningRate 0.0328   Epoch: 8   Global Step: 106160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:45:39,205-Speed 3085.62 samples/sec   Loss 8.4072   LearningRate 0.0328   Epoch: 8   Global Step: 106170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:45:42,534-Speed 3076.22 samples/sec   Loss 8.4614   LearningRate 0.0328   Epoch: 8   Global Step: 106180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:45:45,888-Speed 3053.93 samples/sec   Loss 8.3970   LearningRate 0.0328   Epoch: 8   Global Step: 106190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:45:49,282-Speed 3018.36 samples/sec   Loss 8.4768   LearningRate 0.0328   Epoch: 8   Global Step: 106200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:45:52,731-Speed 2969.48 samples/sec   Loss 8.2469   LearningRate 0.0328   Epoch: 8   Global Step: 106210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:45:56,083-Speed 3056.07 samples/sec   Loss 8.3956   LearningRate 0.0328   Epoch: 8   Global Step: 106220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:45:59,461-Speed 3031.70 samples/sec   Loss 8.3123   LearningRate 0.0328   Epoch: 8   Global Step: 106230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:02,853-Speed 3019.78 samples/sec   Loss 8.2950   LearningRate 0.0328   Epoch: 8   Global Step: 106240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:06,252-Speed 3013.72 samples/sec   Loss 8.2799   LearningRate 0.0328   Epoch: 8   Global Step: 106250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:09,638-Speed 3024.99 samples/sec   Loss 8.3159   LearningRate 0.0327   Epoch: 8   Global Step: 106260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:12,981-Speed 3064.43 samples/sec   Loss 8.2547   LearningRate 0.0327   Epoch: 8   Global Step: 106270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:46:16,323-Speed 3064.64 samples/sec   Loss 8.3135   LearningRate 0.0327   Epoch: 8   Global Step: 106280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:19,679-Speed 3052.33 samples/sec   Loss 8.3488   LearningRate 0.0327   Epoch: 8   Global Step: 106290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:23,066-Speed 3023.46 samples/sec   Loss 8.1961   LearningRate 0.0327   Epoch: 8   Global Step: 106300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:26,448-Speed 3028.38 samples/sec   Loss 8.2742   LearningRate 0.0327   Epoch: 8   Global Step: 106310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:29,815-Speed 3042.79 samples/sec   Loss 8.4260   LearningRate 0.0327   Epoch: 8   Global Step: 106320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:33,191-Speed 3034.34 samples/sec   Loss 8.4399   LearningRate 0.0327   Epoch: 8   Global Step: 106330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:36,519-Speed 3077.47 samples/sec   Loss 8.3924   LearningRate 0.0327   Epoch: 8   Global Step: 106340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:39,863-Speed 3063.24 samples/sec   Loss 8.3917   LearningRate 0.0327   Epoch: 8   Global Step: 106350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:43,216-Speed 3054.56 samples/sec   Loss 8.3402   LearningRate 0.0327   Epoch: 8   Global Step: 106360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:46,623-Speed 3006.02 samples/sec   Loss 8.2924   LearningRate 0.0327   Epoch: 8   Global Step: 106370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:49,950-Speed 3078.90 samples/sec   Loss 8.3202   LearningRate 0.0327   Epoch: 8   Global Step: 106380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:53,343-Speed 3019.33 samples/sec   Loss 8.2276   LearningRate 0.0327   Epoch: 8   Global Step: 106390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:46:56,724-Speed 3029.28 samples/sec   Loss 8.3062   LearningRate 0.0327   Epoch: 8   Global Step: 106400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:47:00,122-Speed 3014.64 samples/sec   Loss 8.3512   LearningRate 0.0327   Epoch: 8   Global Step: 106410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:47:03,511-Speed 3022.14 samples/sec   Loss 8.4169   LearningRate 0.0327   Epoch: 8   Global Step: 106420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:47:06,909-Speed 3014.40 samples/sec   Loss 8.3839   LearningRate 0.0327   Epoch: 8   Global Step: 106430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:47:10,275-Speed 3043.14 samples/sec   Loss 8.2129   LearningRate 0.0327   Epoch: 8   Global Step: 106440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:47:13,639-Speed 3044.99 samples/sec   Loss 8.3329   LearningRate 0.0327   Epoch: 8   Global Step: 106450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:47:16,962-Speed 3081.58 samples/sec   Loss 8.3344   LearningRate 0.0327   Epoch: 8   Global Step: 106460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:47:20,368-Speed 3007.31 samples/sec   Loss 8.1591   LearningRate 0.0327   Epoch: 8   Global Step: 106470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:47:23,684-Speed 3089.17 samples/sec   Loss 8.1150   LearningRate 0.0326   Epoch: 8   Global Step: 106480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:47:27,024-Speed 3066.37 samples/sec   Loss 8.2457   LearningRate 0.0326   Epoch: 8   Global Step: 106490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:47:30,399-Speed 3035.12 samples/sec   Loss 8.3155   LearningRate 0.0326   Epoch: 8   Global Step: 106500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:47:33,791-Speed 3020.12 samples/sec   Loss 8.4462   LearningRate 0.0326   Epoch: 8   Global Step: 106510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:47:37,186-Speed 3016.66 samples/sec   Loss 8.4687   LearningRate 0.0326   Epoch: 8   Global Step: 106520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:47:40,559-Speed 3037.05 samples/sec   Loss 8.4997   LearningRate 0.0326   Epoch: 8   Global Step: 106530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:47:43,872-Speed 3091.53 samples/sec   Loss 8.2712   LearningRate 0.0326   Epoch: 8   Global Step: 106540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:47:47,233-Speed 3047.52 samples/sec   Loss 8.2860   LearningRate 0.0326   Epoch: 8   Global Step: 106550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:47:50,589-Speed 3051.90 samples/sec   Loss 8.4165   LearningRate 0.0326   Epoch: 8   Global Step: 106560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:47:53,995-Speed 3007.68 samples/sec   Loss 8.4278   LearningRate 0.0326   Epoch: 8   Global Step: 106570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:47:57,420-Speed 2989.79 samples/sec   Loss 8.4290   LearningRate 0.0326   Epoch: 8   Global Step: 106580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:48:00,868-Speed 2970.86 samples/sec   Loss 8.2365   LearningRate 0.0326   Epoch: 8   Global Step: 106590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:48:04,213-Speed 3062.36 samples/sec   Loss 8.2448   LearningRate 0.0326   Epoch: 8   Global Step: 106600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:48:07,680-Speed 2954.44 samples/sec   Loss 8.3832   LearningRate 0.0326   Epoch: 8   Global Step: 106610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:48:11,027-Speed 3059.90 samples/sec   Loss 8.4636   LearningRate 0.0326   Epoch: 8   Global Step: 106620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:48:14,410-Speed 3027.52 samples/sec   Loss 8.3498   LearningRate 0.0326   Epoch: 8   Global Step: 106630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:48:17,763-Speed 3055.47 samples/sec   Loss 8.4316   LearningRate 0.0326   Epoch: 8   Global Step: 106640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:48:21,077-Speed 3091.16 samples/sec   Loss 8.2073   LearningRate 0.0326   Epoch: 8   Global Step: 106650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:48:24,377-Speed 3103.82 samples/sec   Loss 8.3756   LearningRate 0.0326   Epoch: 8   Global Step: 106660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:48:27,697-Speed 3085.38 samples/sec   Loss 8.2070   LearningRate 0.0326   Epoch: 8   Global Step: 106670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:48:31,083-Speed 3025.04 samples/sec   Loss 8.3273   LearningRate 0.0326   Epoch: 8   Global Step: 106680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:48:34,447-Speed 3044.95 samples/sec   Loss 8.3373   LearningRate 0.0325   Epoch: 8   Global Step: 106690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:48:37,799-Speed 3055.82 samples/sec   Loss 8.3629   LearningRate 0.0325   Epoch: 8   Global Step: 106700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:48:41,182-Speed 3027.62 samples/sec   Loss 8.3752   LearningRate 0.0325   Epoch: 8   Global Step: 106710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:48:44,504-Speed 3083.71 samples/sec   Loss 8.3325   LearningRate 0.0325   Epoch: 8   Global Step: 106720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:48:47,871-Speed 3041.70 samples/sec   Loss 8.2100   LearningRate 0.0325   Epoch: 8   Global Step: 106730   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:48:51,268-Speed 3015.75 samples/sec   Loss 8.2116   LearningRate 0.0325   Epoch: 8   Global Step: 106740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:48:54,693-Speed 2990.27 samples/sec   Loss 8.2430   LearningRate 0.0325   Epoch: 8   Global Step: 106750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:48:58,144-Speed 2967.99 samples/sec   Loss 8.2225   LearningRate 0.0325   Epoch: 8   Global Step: 106760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:49:01,474-Speed 3076.02 samples/sec   Loss 8.2855   LearningRate 0.0325   Epoch: 8   Global Step: 106770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:49:04,868-Speed 3017.92 samples/sec   Loss 8.3148   LearningRate 0.0325   Epoch: 8   Global Step: 106780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:49:08,214-Speed 3061.54 samples/sec   Loss 8.3864   LearningRate 0.0325   Epoch: 8   Global Step: 106790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:49:11,584-Speed 3039.54 samples/sec   Loss 8.2750   LearningRate 0.0325   Epoch: 8   Global Step: 106800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:49:14,965-Speed 3030.06 samples/sec   Loss 8.3455   LearningRate 0.0325   Epoch: 8   Global Step: 106810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:49:18,326-Speed 3047.40 samples/sec   Loss 8.3256   LearningRate 0.0325   Epoch: 8   Global Step: 106820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:49:21,756-Speed 2986.31 samples/sec   Loss 8.3668   LearningRate 0.0325   Epoch: 8   Global Step: 106830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:49:25,130-Speed 3035.91 samples/sec   Loss 8.3313   LearningRate 0.0325   Epoch: 8   Global Step: 106840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:49:28,537-Speed 3006.51 samples/sec   Loss 8.3565   LearningRate 0.0325   Epoch: 8   Global Step: 106850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:49:31,852-Speed 3090.18 samples/sec   Loss 8.3030   LearningRate 0.0325   Epoch: 8   Global Step: 106860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:49:35,231-Speed 3031.36 samples/sec   Loss 8.3067   LearningRate 0.0325   Epoch: 8   Global Step: 106870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:49:38,583-Speed 3055.54 samples/sec   Loss 8.1398   LearningRate 0.0325   Epoch: 8   Global Step: 106880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:49:41,964-Speed 3029.76 samples/sec   Loss 8.1684   LearningRate 0.0325   Epoch: 8   Global Step: 106890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:49:45,336-Speed 3037.39 samples/sec   Loss 8.4750   LearningRate 0.0325   Epoch: 8   Global Step: 106900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:49:48,775-Speed 2978.34 samples/sec   Loss 8.2103   LearningRate 0.0324   Epoch: 8   Global Step: 106910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:49:52,091-Speed 3088.90 samples/sec   Loss 8.3099   LearningRate 0.0324   Epoch: 8   Global Step: 106920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:49:55,493-Speed 3010.74 samples/sec   Loss 8.2390   LearningRate 0.0324   Epoch: 8   Global Step: 106930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:49:58,819-Speed 3080.54 samples/sec   Loss 8.2743   LearningRate 0.0324   Epoch: 8   Global Step: 106940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:50:02,240-Speed 2994.00 samples/sec   Loss 8.1747   LearningRate 0.0324   Epoch: 8   Global Step: 106950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:50:05,619-Speed 3031.10 samples/sec   Loss 8.3005   LearningRate 0.0324   Epoch: 8   Global Step: 106960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:50:08,984-Speed 3044.29 samples/sec   Loss 8.2751   LearningRate 0.0324   Epoch: 8   Global Step: 106970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:50:12,337-Speed 3054.08 samples/sec   Loss 8.2783   LearningRate 0.0324   Epoch: 8   Global Step: 106980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:50:15,744-Speed 3006.05 samples/sec   Loss 8.3639   LearningRate 0.0324   Epoch: 8   Global Step: 106990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:50:19,112-Speed 3041.67 samples/sec   Loss 8.1129   LearningRate 0.0324   Epoch: 8   Global Step: 107000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:50:22,474-Speed 3046.83 samples/sec   Loss 8.4432   LearningRate 0.0324   Epoch: 8   Global Step: 107010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:50:25,808-Speed 3072.55 samples/sec   Loss 8.1291   LearningRate 0.0324   Epoch: 8   Global Step: 107020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:50:29,140-Speed 3074.79 samples/sec   Loss 8.3487   LearningRate 0.0324   Epoch: 8   Global Step: 107030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:50:32,513-Speed 3036.63 samples/sec   Loss 8.2880   LearningRate 0.0324   Epoch: 8   Global Step: 107040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:50:35,837-Speed 3081.13 samples/sec   Loss 8.1055   LearningRate 0.0324   Epoch: 8   Global Step: 107050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:50:39,220-Speed 3027.39 samples/sec   Loss 8.1859   LearningRate 0.0324   Epoch: 8   Global Step: 107060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:50:42,566-Speed 3061.22 samples/sec   Loss 8.2138   LearningRate 0.0324   Epoch: 8   Global Step: 107070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:50:45,917-Speed 3056.70 samples/sec   Loss 8.3923   LearningRate 0.0324   Epoch: 8   Global Step: 107080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:50:49,236-Speed 3086.01 samples/sec   Loss 8.3020   LearningRate 0.0324   Epoch: 8   Global Step: 107090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:50:52,562-Speed 3079.74 samples/sec   Loss 8.2845   LearningRate 0.0324   Epoch: 8   Global Step: 107100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:50:55,882-Speed 3085.15 samples/sec   Loss 8.3217   LearningRate 0.0324   Epoch: 8   Global Step: 107110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:50:59,217-Speed 3071.84 samples/sec   Loss 8.2860   LearningRate 0.0324   Epoch: 8   Global Step: 107120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:51:02,572-Speed 3053.14 samples/sec   Loss 8.2920   LearningRate 0.0323   Epoch: 8   Global Step: 107130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:51:06,000-Speed 2987.64 samples/sec   Loss 8.3685   LearningRate 0.0323   Epoch: 8   Global Step: 107140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:51:09,370-Speed 3039.57 samples/sec   Loss 8.3707   LearningRate 0.0323   Epoch: 8   Global Step: 107150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:51:12,689-Speed 3085.88 samples/sec   Loss 8.3198   LearningRate 0.0323   Epoch: 8   Global Step: 107160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:51:16,063-Speed 3035.71 samples/sec   Loss 8.3946   LearningRate 0.0323   Epoch: 8   Global Step: 107170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:51:19,452-Speed 3022.42 samples/sec   Loss 8.3750   LearningRate 0.0323   Epoch: 8   Global Step: 107180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:51:22,855-Speed 3009.95 samples/sec   Loss 8.3301   LearningRate 0.0323   Epoch: 8   Global Step: 107190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:51:26,226-Speed 3038.47 samples/sec   Loss 8.1573   LearningRate 0.0323   Epoch: 8   Global Step: 107200   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 11:51:29,666-Speed 2978.08 samples/sec   Loss 8.2520   LearningRate 0.0323   Epoch: 8   Global Step: 107210   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 11:51:33,052-Speed 3024.62 samples/sec   Loss 8.2901   LearningRate 0.0323   Epoch: 8   Global Step: 107220   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 11:51:36,464-Speed 3002.29 samples/sec   Loss 8.2845   LearningRate 0.0323   Epoch: 8   Global Step: 107230   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 11:51:39,825-Speed 3046.94 samples/sec   Loss 8.4480   LearningRate 0.0323   Epoch: 8   Global Step: 107240   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 11:51:43,174-Speed 3060.69 samples/sec   Loss 8.1762   LearningRate 0.0323   Epoch: 8   Global Step: 107250   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 11:51:46,558-Speed 3026.82 samples/sec   Loss 8.2943   LearningRate 0.0323   Epoch: 8   Global Step: 107260   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 11:51:49,913-Speed 3052.53 samples/sec   Loss 8.2283   LearningRate 0.0323   Epoch: 8   Global Step: 107270   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 11:51:53,238-Speed 3080.74 samples/sec   Loss 8.2214   LearningRate 0.0323   Epoch: 8   Global Step: 107280   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 11:51:56,642-Speed 3008.54 samples/sec   Loss 8.3514   LearningRate 0.0323   Epoch: 8   Global Step: 107290   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 11:51:59,984-Speed 3065.48 samples/sec   Loss 8.2806   LearningRate 0.0323   Epoch: 8   Global Step: 107300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:52:03,391-Speed 3006.03 samples/sec   Loss 8.2026   LearningRate 0.0323   Epoch: 8   Global Step: 107310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:52:06,713-Speed 3083.11 samples/sec   Loss 8.4055   LearningRate 0.0323   Epoch: 8   Global Step: 107320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:52:10,067-Speed 3054.05 samples/sec   Loss 8.3627   LearningRate 0.0323   Epoch: 8   Global Step: 107330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:52:13,450-Speed 3027.66 samples/sec   Loss 8.3472   LearningRate 0.0323   Epoch: 8   Global Step: 107340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:52:16,853-Speed 3010.43 samples/sec   Loss 8.1917   LearningRate 0.0322   Epoch: 8   Global Step: 107350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:52:20,205-Speed 3055.67 samples/sec   Loss 8.3397   LearningRate 0.0322   Epoch: 8   Global Step: 107360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:52:23,586-Speed 3029.62 samples/sec   Loss 8.2820   LearningRate 0.0322   Epoch: 8   Global Step: 107370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:52:27,000-Speed 2999.68 samples/sec   Loss 8.3853   LearningRate 0.0322   Epoch: 8   Global Step: 107380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:52:30,383-Speed 3028.00 samples/sec   Loss 8.2871   LearningRate 0.0322   Epoch: 8   Global Step: 107390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:52:33,738-Speed 3052.87 samples/sec   Loss 8.3361   LearningRate 0.0322   Epoch: 8   Global Step: 107400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:52:37,079-Speed 3065.45 samples/sec   Loss 8.3488   LearningRate 0.0322   Epoch: 8   Global Step: 107410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:52:40,398-Speed 3086.41 samples/sec   Loss 8.3841   LearningRate 0.0322   Epoch: 8   Global Step: 107420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:52:43,812-Speed 3000.66 samples/sec   Loss 8.2051   LearningRate 0.0322   Epoch: 8   Global Step: 107430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:52:47,233-Speed 2993.34 samples/sec   Loss 8.2928   LearningRate 0.0322   Epoch: 8   Global Step: 107440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:52:50,625-Speed 3019.91 samples/sec   Loss 8.2754   LearningRate 0.0322   Epoch: 8   Global Step: 107450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:52:53,951-Speed 3080.06 samples/sec   Loss 8.2567   LearningRate 0.0322   Epoch: 8   Global Step: 107460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:52:57,306-Speed 3052.22 samples/sec   Loss 8.3264   LearningRate 0.0322   Epoch: 8   Global Step: 107470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:53:00,684-Speed 3032.76 samples/sec   Loss 8.2015   LearningRate 0.0322   Epoch: 8   Global Step: 107480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:53:04,016-Speed 3074.20 samples/sec   Loss 8.2890   LearningRate 0.0322   Epoch: 8   Global Step: 107490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:53:07,427-Speed 3002.45 samples/sec   Loss 8.2957   LearningRate 0.0322   Epoch: 8   Global Step: 107500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:53:10,840-Speed 3000.99 samples/sec   Loss 8.2069   LearningRate 0.0322   Epoch: 8   Global Step: 107510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:53:14,261-Speed 2994.63 samples/sec   Loss 8.2661   LearningRate 0.0322   Epoch: 8   Global Step: 107520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:53:17,640-Speed 3030.35 samples/sec   Loss 8.2936   LearningRate 0.0322   Epoch: 8   Global Step: 107530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:53:21,081-Speed 2976.82 samples/sec   Loss 8.3029   LearningRate 0.0322   Epoch: 8   Global Step: 107540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:53:24,514-Speed 2983.72 samples/sec   Loss 8.1970   LearningRate 0.0322   Epoch: 8   Global Step: 107550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:53:27,912-Speed 3014.05 samples/sec   Loss 8.2767   LearningRate 0.0322   Epoch: 8   Global Step: 107560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:53:31,245-Speed 3073.46 samples/sec   Loss 8.1979   LearningRate 0.0321   Epoch: 8   Global Step: 107570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:53:34,585-Speed 3066.54 samples/sec   Loss 8.3397   LearningRate 0.0321   Epoch: 8   Global Step: 107580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:53:37,935-Speed 3058.20 samples/sec   Loss 8.3029   LearningRate 0.0321   Epoch: 8   Global Step: 107590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:53:41,298-Speed 3045.45 samples/sec   Loss 8.2531   LearningRate 0.0321   Epoch: 8   Global Step: 107600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:53:44,688-Speed 3022.05 samples/sec   Loss 8.3065   LearningRate 0.0321   Epoch: 8   Global Step: 107610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:53:48,058-Speed 3039.19 samples/sec   Loss 8.1721   LearningRate 0.0321   Epoch: 8   Global Step: 107620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:53:51,578-Speed 2910.22 samples/sec   Loss 8.2839   LearningRate 0.0321   Epoch: 8   Global Step: 107630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:53:55,010-Speed 2984.71 samples/sec   Loss 8.2582   LearningRate 0.0321   Epoch: 8   Global Step: 107640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:53:58,422-Speed 3002.42 samples/sec   Loss 8.3714   LearningRate 0.0321   Epoch: 8   Global Step: 107650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:54:01,863-Speed 2977.19 samples/sec   Loss 8.2494   LearningRate 0.0321   Epoch: 8   Global Step: 107660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:54:05,211-Speed 3058.85 samples/sec   Loss 8.2747   LearningRate 0.0321   Epoch: 8   Global Step: 107670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:54:08,554-Speed 3064.34 samples/sec   Loss 8.3767   LearningRate 0.0321   Epoch: 8   Global Step: 107680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:54:11,909-Speed 3052.97 samples/sec   Loss 8.2489   LearningRate 0.0321   Epoch: 8   Global Step: 107690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:54:15,249-Speed 3067.41 samples/sec   Loss 8.1264   LearningRate 0.0321   Epoch: 8   Global Step: 107700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:54:18,607-Speed 3050.24 samples/sec   Loss 8.2338   LearningRate 0.0321   Epoch: 8   Global Step: 107710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:54:21,965-Speed 3049.97 samples/sec   Loss 8.2076   LearningRate 0.0321   Epoch: 8   Global Step: 107720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:54:25,307-Speed 3065.07 samples/sec   Loss 8.3363   LearningRate 0.0321   Epoch: 8   Global Step: 107730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:54:28,685-Speed 3032.31 samples/sec   Loss 8.3494   LearningRate 0.0321   Epoch: 8   Global Step: 107740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:54:32,075-Speed 3021.22 samples/sec   Loss 8.1879   LearningRate 0.0321   Epoch: 8   Global Step: 107750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:54:35,464-Speed 3022.86 samples/sec   Loss 8.2249   LearningRate 0.0321   Epoch: 8   Global Step: 107760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:54:38,789-Speed 3080.42 samples/sec   Loss 8.2947   LearningRate 0.0321   Epoch: 8   Global Step: 107770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:54:42,136-Speed 3059.87 samples/sec   Loss 8.2756   LearningRate 0.0321   Epoch: 8   Global Step: 107780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:54:45,510-Speed 3036.07 samples/sec   Loss 8.3645   LearningRate 0.0320   Epoch: 8   Global Step: 107790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:54:48,944-Speed 2983.09 samples/sec   Loss 8.1605   LearningRate 0.0320   Epoch: 8   Global Step: 107800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:54:52,278-Speed 3071.64 samples/sec   Loss 8.2754   LearningRate 0.0320   Epoch: 8   Global Step: 107810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:54:55,596-Speed 3087.37 samples/sec   Loss 8.3044   LearningRate 0.0320   Epoch: 8   Global Step: 107820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:54:58,941-Speed 3062.32 samples/sec   Loss 8.3037   LearningRate 0.0320   Epoch: 8   Global Step: 107830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:55:02,303-Speed 3045.91 samples/sec   Loss 8.2464   LearningRate 0.0320   Epoch: 8   Global Step: 107840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:55:05,647-Speed 3063.15 samples/sec   Loss 8.3117   LearningRate 0.0320   Epoch: 8   Global Step: 107850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:55:09,000-Speed 3055.31 samples/sec   Loss 8.3376   LearningRate 0.0320   Epoch: 8   Global Step: 107860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:55:12,327-Speed 3078.67 samples/sec   Loss 8.2461   LearningRate 0.0320   Epoch: 8   Global Step: 107870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:55:15,718-Speed 3020.14 samples/sec   Loss 8.1526   LearningRate 0.0320   Epoch: 8   Global Step: 107880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:55:19,110-Speed 3020.03 samples/sec   Loss 8.2219   LearningRate 0.0320   Epoch: 8   Global Step: 107890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:55:22,527-Speed 2998.01 samples/sec   Loss 8.0381   LearningRate 0.0320   Epoch: 8   Global Step: 107900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:55:26,006-Speed 2943.48 samples/sec   Loss 8.1331   LearningRate 0.0320   Epoch: 8   Global Step: 107910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:55:29,363-Speed 3051.73 samples/sec   Loss 8.2536   LearningRate 0.0320   Epoch: 8   Global Step: 107920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:55:32,737-Speed 3035.67 samples/sec   Loss 8.3082   LearningRate 0.0320   Epoch: 8   Global Step: 107930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:55:36,059-Speed 3083.66 samples/sec   Loss 8.3433   LearningRate 0.0320   Epoch: 8   Global Step: 107940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:55:39,532-Speed 2948.85 samples/sec   Loss 8.2117   LearningRate 0.0320   Epoch: 8   Global Step: 107950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:55:42,963-Speed 2985.32 samples/sec   Loss 8.2906   LearningRate 0.0320   Epoch: 8   Global Step: 107960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:55:46,321-Speed 3050.14 samples/sec   Loss 8.1570   LearningRate 0.0320   Epoch: 8   Global Step: 107970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:55:49,710-Speed 3022.60 samples/sec   Loss 8.2042   LearningRate 0.0320   Epoch: 8   Global Step: 107980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:55:53,090-Speed 3030.08 samples/sec   Loss 8.2546   LearningRate 0.0320   Epoch: 8   Global Step: 107990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:55:56,460-Speed 3039.76 samples/sec   Loss 8.2044   LearningRate 0.0320   Epoch: 8   Global Step: 108000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:55:59,837-Speed 3033.51 samples/sec   Loss 8.3937   LearningRate 0.0319   Epoch: 8   Global Step: 108010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:56:03,248-Speed 3002.68 samples/sec   Loss 8.3429   LearningRate 0.0319   Epoch: 8   Global Step: 108020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:56:06,661-Speed 3000.84 samples/sec   Loss 8.1239   LearningRate 0.0319   Epoch: 8   Global Step: 108030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:56:10,103-Speed 2975.35 samples/sec   Loss 8.2066   LearningRate 0.0319   Epoch: 8   Global Step: 108040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:56:13,490-Speed 3024.46 samples/sec   Loss 8.2326   LearningRate 0.0319   Epoch: 8   Global Step: 108050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:56:16,947-Speed 2962.79 samples/sec   Loss 8.3221   LearningRate 0.0319   Epoch: 8   Global Step: 108060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:56:20,351-Speed 3009.09 samples/sec   Loss 8.2082   LearningRate 0.0319   Epoch: 8   Global Step: 108070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:56:23,727-Speed 3033.66 samples/sec   Loss 8.2116   LearningRate 0.0319   Epoch: 8   Global Step: 108080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:56:27,060-Speed 3073.01 samples/sec   Loss 8.1529   LearningRate 0.0319   Epoch: 8   Global Step: 108090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:56:30,421-Speed 3048.18 samples/sec   Loss 8.2950   LearningRate 0.0319   Epoch: 8   Global Step: 108100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:56:33,772-Speed 3056.27 samples/sec   Loss 8.2765   LearningRate 0.0319   Epoch: 8   Global Step: 108110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:56:37,159-Speed 3023.97 samples/sec   Loss 8.3084   LearningRate 0.0319   Epoch: 8   Global Step: 108120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:56:40,597-Speed 2979.89 samples/sec   Loss 8.1905   LearningRate 0.0319   Epoch: 8   Global Step: 108130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:56:44,071-Speed 2948.53 samples/sec   Loss 8.2921   LearningRate 0.0319   Epoch: 8   Global Step: 108140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:56:47,399-Speed 3077.14 samples/sec   Loss 8.2492   LearningRate 0.0319   Epoch: 8   Global Step: 108150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:56:50,792-Speed 3018.92 samples/sec   Loss 8.2467   LearningRate 0.0319   Epoch: 8   Global Step: 108160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:56:54,218-Speed 2990.47 samples/sec   Loss 8.1547   LearningRate 0.0319   Epoch: 8   Global Step: 108170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:56:57,595-Speed 3032.68 samples/sec   Loss 8.1989   LearningRate 0.0319   Epoch: 8   Global Step: 108180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:57:01,025-Speed 2986.64 samples/sec   Loss 8.1249   LearningRate 0.0319   Epoch: 8   Global Step: 108190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:57:04,406-Speed 3029.19 samples/sec   Loss 8.1214   LearningRate 0.0319   Epoch: 8   Global Step: 108200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:57:07,785-Speed 3031.47 samples/sec   Loss 8.3171   LearningRate 0.0319   Epoch: 8   Global Step: 108210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:57:11,213-Speed 2988.06 samples/sec   Loss 8.1367   LearningRate 0.0319   Epoch: 8   Global Step: 108220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:57:14,554-Speed 3066.22 samples/sec   Loss 8.3665   LearningRate 0.0318   Epoch: 8   Global Step: 108230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:57:17,910-Speed 3052.36 samples/sec   Loss 8.3566   LearningRate 0.0318   Epoch: 8   Global Step: 108240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:57:21,233-Speed 3081.93 samples/sec   Loss 8.0752   LearningRate 0.0318   Epoch: 8   Global Step: 108250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:57:24,581-Speed 3059.20 samples/sec   Loss 8.1850   LearningRate 0.0318   Epoch: 8   Global Step: 108260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:57:27,942-Speed 3048.18 samples/sec   Loss 8.1653   LearningRate 0.0318   Epoch: 8   Global Step: 108270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:57:31,322-Speed 3030.14 samples/sec   Loss 8.2574   LearningRate 0.0318   Epoch: 8   Global Step: 108280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:57:34,665-Speed 3064.09 samples/sec   Loss 8.3527   LearningRate 0.0318   Epoch: 8   Global Step: 108290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:57:38,081-Speed 2999.03 samples/sec   Loss 8.3117   LearningRate 0.0318   Epoch: 8   Global Step: 108300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:57:41,469-Speed 3022.85 samples/sec   Loss 8.2450   LearningRate 0.0318   Epoch: 8   Global Step: 108310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:57:44,810-Speed 3066.01 samples/sec   Loss 8.2962   LearningRate 0.0318   Epoch: 8   Global Step: 108320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:57:48,158-Speed 3059.07 samples/sec   Loss 8.2579   LearningRate 0.0318   Epoch: 8   Global Step: 108330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:57:51,580-Speed 2993.07 samples/sec   Loss 8.2389   LearningRate 0.0318   Epoch: 8   Global Step: 108340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:57:55,036-Speed 2964.10 samples/sec   Loss 8.1075   LearningRate 0.0318   Epoch: 8   Global Step: 108350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:57:58,422-Speed 3025.06 samples/sec   Loss 8.2285   LearningRate 0.0318   Epoch: 8   Global Step: 108360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:58:01,770-Speed 3059.37 samples/sec   Loss 8.2285   LearningRate 0.0318   Epoch: 8   Global Step: 108370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:58:05,091-Speed 3084.45 samples/sec   Loss 8.2509   LearningRate 0.0318   Epoch: 8   Global Step: 108380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:58:08,481-Speed 3021.37 samples/sec   Loss 8.2481   LearningRate 0.0318   Epoch: 8   Global Step: 108390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:58:11,917-Speed 2981.11 samples/sec   Loss 8.3313   LearningRate 0.0318   Epoch: 8   Global Step: 108400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:58:15,326-Speed 3004.96 samples/sec   Loss 8.2676   LearningRate 0.0318   Epoch: 8   Global Step: 108410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:58:18,774-Speed 2970.89 samples/sec   Loss 8.3374   LearningRate 0.0318   Epoch: 8   Global Step: 108420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:58:22,107-Speed 3073.40 samples/sec   Loss 8.2802   LearningRate 0.0318   Epoch: 8   Global Step: 108430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:58:25,412-Speed 3098.85 samples/sec   Loss 8.2817   LearningRate 0.0318   Epoch: 8   Global Step: 108440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:58:28,784-Speed 3037.88 samples/sec   Loss 8.0996   LearningRate 0.0317   Epoch: 8   Global Step: 108450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:58:32,153-Speed 3040.32 samples/sec   Loss 8.1778   LearningRate 0.0317   Epoch: 8   Global Step: 108460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:58:35,643-Speed 2935.26 samples/sec   Loss 8.2580   LearningRate 0.0317   Epoch: 8   Global Step: 108470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:58:38,952-Speed 3095.28 samples/sec   Loss 8.2023   LearningRate 0.0317   Epoch: 8   Global Step: 108480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:58:42,280-Speed 3077.91 samples/sec   Loss 8.1966   LearningRate 0.0317   Epoch: 8   Global Step: 108490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:58:45,613-Speed 3073.12 samples/sec   Loss 8.1138   LearningRate 0.0317   Epoch: 8   Global Step: 108500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:58:49,007-Speed 3017.51 samples/sec   Loss 8.1542   LearningRate 0.0317   Epoch: 8   Global Step: 108510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:58:52,363-Speed 3052.03 samples/sec   Loss 8.1771   LearningRate 0.0317   Epoch: 8   Global Step: 108520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:58:55,771-Speed 3005.76 samples/sec   Loss 8.3994   LearningRate 0.0317   Epoch: 8   Global Step: 108530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:58:59,130-Speed 3050.20 samples/sec   Loss 8.1788   LearningRate 0.0317   Epoch: 8   Global Step: 108540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:59:02,455-Speed 3080.35 samples/sec   Loss 8.2618   LearningRate 0.0317   Epoch: 8   Global Step: 108550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 11:59:05,790-Speed 3071.38 samples/sec   Loss 8.2395   LearningRate 0.0317   Epoch: 8   Global Step: 108560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:59:09,153-Speed 3045.40 samples/sec   Loss 8.1553   LearningRate 0.0317   Epoch: 8   Global Step: 108570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:59:12,553-Speed 3012.88 samples/sec   Loss 8.2405   LearningRate 0.0317   Epoch: 8   Global Step: 108580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:59:15,865-Speed 3092.73 samples/sec   Loss 8.2335   LearningRate 0.0317   Epoch: 8   Global Step: 108590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:59:19,269-Speed 3009.27 samples/sec   Loss 8.3411   LearningRate 0.0317   Epoch: 8   Global Step: 108600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:59:22,589-Speed 3084.91 samples/sec   Loss 8.1903   LearningRate 0.0317   Epoch: 8   Global Step: 108610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:59:26,046-Speed 2962.83 samples/sec   Loss 8.2113   LearningRate 0.0317   Epoch: 8   Global Step: 108620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:59:29,477-Speed 2985.16 samples/sec   Loss 8.0401   LearningRate 0.0317   Epoch: 8   Global Step: 108630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:59:32,810-Speed 3073.17 samples/sec   Loss 8.2723   LearningRate 0.0317   Epoch: 8   Global Step: 108640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:59:36,214-Speed 3009.79 samples/sec   Loss 8.1459   LearningRate 0.0317   Epoch: 8   Global Step: 108650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:59:39,647-Speed 2983.40 samples/sec   Loss 8.1232   LearningRate 0.0317   Epoch: 8   Global Step: 108660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:59:43,066-Speed 2995.54 samples/sec   Loss 8.1251   LearningRate 0.0316   Epoch: 8   Global Step: 108670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 11:59:46,506-Speed 2977.53 samples/sec   Loss 8.1670   LearningRate 0.0316   Epoch: 8   Global Step: 108680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:59:49,984-Speed 2945.29 samples/sec   Loss 8.1454   LearningRate 0.0316   Epoch: 8   Global Step: 108690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:59:53,450-Speed 2954.94 samples/sec   Loss 8.1132   LearningRate 0.0316   Epoch: 8   Global Step: 108700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 11:59:56,766-Speed 3088.42 samples/sec   Loss 8.2254   LearningRate 0.0316   Epoch: 8   Global Step: 108710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:00,114-Speed 3060.32 samples/sec   Loss 8.1794   LearningRate 0.0316   Epoch: 8   Global Step: 108720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:03,547-Speed 2983.54 samples/sec   Loss 8.1295   LearningRate 0.0316   Epoch: 8   Global Step: 108730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:06,935-Speed 3022.91 samples/sec   Loss 8.0872   LearningRate 0.0316   Epoch: 8   Global Step: 108740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:10,358-Speed 2992.57 samples/sec   Loss 8.2341   LearningRate 0.0316   Epoch: 8   Global Step: 108750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:13,821-Speed 2957.71 samples/sec   Loss 8.2089   LearningRate 0.0316   Epoch: 8   Global Step: 108760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:17,193-Speed 3037.33 samples/sec   Loss 8.1327   LearningRate 0.0316   Epoch: 8   Global Step: 108770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:20,481-Speed 3116.23 samples/sec   Loss 8.3017   LearningRate 0.0316   Epoch: 8   Global Step: 108780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:23,921-Speed 2977.11 samples/sec   Loss 8.1711   LearningRate 0.0316   Epoch: 8   Global Step: 108790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:27,304-Speed 3028.05 samples/sec   Loss 8.2164   LearningRate 0.0316   Epoch: 8   Global Step: 108800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:30,630-Speed 3079.36 samples/sec   Loss 8.1870   LearningRate 0.0316   Epoch: 8   Global Step: 108810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:34,088-Speed 2962.09 samples/sec   Loss 8.2363   LearningRate 0.0316   Epoch: 8   Global Step: 108820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:37,454-Speed 3042.86 samples/sec   Loss 8.1180   LearningRate 0.0316   Epoch: 8   Global Step: 108830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:40,798-Speed 3063.32 samples/sec   Loss 8.0705   LearningRate 0.0316   Epoch: 8   Global Step: 108840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:44,198-Speed 3012.50 samples/sec   Loss 8.1895   LearningRate 0.0316   Epoch: 8   Global Step: 108850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:47,654-Speed 2963.92 samples/sec   Loss 8.2076   LearningRate 0.0316   Epoch: 8   Global Step: 108860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:51,071-Speed 2998.25 samples/sec   Loss 8.1837   LearningRate 0.0316   Epoch: 8   Global Step: 108870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:00:54,442-Speed 3037.73 samples/sec   Loss 8.1658   LearningRate 0.0316   Epoch: 8   Global Step: 108880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:00:57,829-Speed 3024.99 samples/sec   Loss 8.2405   LearningRate 0.0315   Epoch: 8   Global Step: 108890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:01:01,157-Speed 3077.97 samples/sec   Loss 8.0409   LearningRate 0.0315   Epoch: 8   Global Step: 108900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:01:04,531-Speed 3035.75 samples/sec   Loss 8.3393   LearningRate 0.0315   Epoch: 8   Global Step: 108910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:01:07,919-Speed 3023.10 samples/sec   Loss 8.1264   LearningRate 0.0315   Epoch: 8   Global Step: 108920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:01:11,312-Speed 3018.56 samples/sec   Loss 8.3057   LearningRate 0.0315   Epoch: 8   Global Step: 108930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:01:14,711-Speed 3013.85 samples/sec   Loss 8.3130   LearningRate 0.0315   Epoch: 8   Global Step: 108940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:01:18,175-Speed 2957.10 samples/sec   Loss 8.2936   LearningRate 0.0315   Epoch: 8   Global Step: 108950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:01:21,565-Speed 3021.31 samples/sec   Loss 8.2283   LearningRate 0.0315   Epoch: 8   Global Step: 108960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:01:25,014-Speed 2970.49 samples/sec   Loss 8.1998   LearningRate 0.0315   Epoch: 8   Global Step: 108970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:01:28,389-Speed 3034.67 samples/sec   Loss 8.0439   LearningRate 0.0315   Epoch: 8   Global Step: 108980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:01:31,697-Speed 3096.77 samples/sec   Loss 8.2589   LearningRate 0.0315   Epoch: 8   Global Step: 108990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:01:35,028-Speed 3074.53 samples/sec   Loss 8.1849   LearningRate 0.0315   Epoch: 8   Global Step: 109000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:01:38,430-Speed 3010.89 samples/sec   Loss 8.3067   LearningRate 0.0315   Epoch: 8   Global Step: 109010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:01:41,742-Speed 3092.27 samples/sec   Loss 8.1906   LearningRate 0.0315   Epoch: 8   Global Step: 109020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:01:45,133-Speed 3020.84 samples/sec   Loss 8.3072   LearningRate 0.0315   Epoch: 8   Global Step: 109030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:01:48,475-Speed 3065.57 samples/sec   Loss 8.2530   LearningRate 0.0315   Epoch: 8   Global Step: 109040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:01:51,791-Speed 3088.64 samples/sec   Loss 8.2116   LearningRate 0.0315   Epoch: 8   Global Step: 109050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:01:55,101-Speed 3094.03 samples/sec   Loss 8.1133   LearningRate 0.0315   Epoch: 8   Global Step: 109060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:01:58,414-Speed 3092.80 samples/sec   Loss 8.1817   LearningRate 0.0315   Epoch: 8   Global Step: 109070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:01,850-Speed 2980.93 samples/sec   Loss 8.1815   LearningRate 0.0315   Epoch: 8   Global Step: 109080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:05,210-Speed 3048.48 samples/sec   Loss 8.1759   LearningRate 0.0315   Epoch: 8   Global Step: 109090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:08,628-Speed 2996.69 samples/sec   Loss 8.2069   LearningRate 0.0315   Epoch: 8   Global Step: 109100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:11,973-Speed 3061.86 samples/sec   Loss 8.2244   LearningRate 0.0314   Epoch: 8   Global Step: 109110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:15,399-Speed 2989.18 samples/sec   Loss 8.1934   LearningRate 0.0314   Epoch: 8   Global Step: 109120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:18,764-Speed 3044.43 samples/sec   Loss 8.1290   LearningRate 0.0314   Epoch: 8   Global Step: 109130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:22,131-Speed 3042.63 samples/sec   Loss 8.0436   LearningRate 0.0314   Epoch: 8   Global Step: 109140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:25,494-Speed 3044.73 samples/sec   Loss 8.1833   LearningRate 0.0314   Epoch: 8   Global Step: 109150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:28,907-Speed 3001.23 samples/sec   Loss 8.2759   LearningRate 0.0314   Epoch: 8   Global Step: 109160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:32,330-Speed 2993.00 samples/sec   Loss 8.2526   LearningRate 0.0314   Epoch: 8   Global Step: 109170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:35,724-Speed 3017.66 samples/sec   Loss 8.1957   LearningRate 0.0314   Epoch: 8   Global Step: 109180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:39,182-Speed 2961.86 samples/sec   Loss 8.0448   LearningRate 0.0314   Epoch: 8   Global Step: 109190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:42,563-Speed 3029.86 samples/sec   Loss 8.1450   LearningRate 0.0314   Epoch: 8   Global Step: 109200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:45,934-Speed 3037.99 samples/sec   Loss 8.1359   LearningRate 0.0314   Epoch: 8   Global Step: 109210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:49,298-Speed 3044.68 samples/sec   Loss 8.0537   LearningRate 0.0314   Epoch: 8   Global Step: 109220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:52,710-Speed 3002.20 samples/sec   Loss 8.1499   LearningRate 0.0314   Epoch: 8   Global Step: 109230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:02:56,123-Speed 3000.74 samples/sec   Loss 8.2720   LearningRate 0.0314   Epoch: 8   Global Step: 109240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:02:59,456-Speed 3074.02 samples/sec   Loss 8.1400   LearningRate 0.0314   Epoch: 8   Global Step: 109250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:03:02,807-Speed 3056.68 samples/sec   Loss 8.1902   LearningRate 0.0314   Epoch: 8   Global Step: 109260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:03:06,215-Speed 3005.26 samples/sec   Loss 8.2738   LearningRate 0.0314   Epoch: 8   Global Step: 109270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:03:09,611-Speed 3016.57 samples/sec   Loss 8.2683   LearningRate 0.0314   Epoch: 8   Global Step: 109280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:03:13,064-Speed 2966.13 samples/sec   Loss 8.2098   LearningRate 0.0314   Epoch: 8   Global Step: 109290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:03:16,456-Speed 3020.26 samples/sec   Loss 8.0632   LearningRate 0.0314   Epoch: 8   Global Step: 109300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:03:19,816-Speed 3048.14 samples/sec   Loss 8.1074   LearningRate 0.0314   Epoch: 8   Global Step: 109310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:03:23,273-Speed 2963.45 samples/sec   Loss 8.0976   LearningRate 0.0314   Epoch: 8   Global Step: 109320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:03:26,676-Speed 3008.96 samples/sec   Loss 8.0945   LearningRate 0.0313   Epoch: 8   Global Step: 109330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:03:30,051-Speed 3035.83 samples/sec   Loss 8.0309   LearningRate 0.0313   Epoch: 8   Global Step: 109340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:03:33,472-Speed 2994.28 samples/sec   Loss 8.2040   LearningRate 0.0313   Epoch: 8   Global Step: 109350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:03:36,877-Speed 3007.80 samples/sec   Loss 8.1612   LearningRate 0.0313   Epoch: 8   Global Step: 109360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:03:40,399-Speed 2908.78 samples/sec   Loss 8.2387   LearningRate 0.0313   Epoch: 8   Global Step: 109370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:03:43,736-Speed 3068.90 samples/sec   Loss 8.1002   LearningRate 0.0313   Epoch: 8   Global Step: 109380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:03:47,162-Speed 2989.93 samples/sec   Loss 8.1752   LearningRate 0.0313   Epoch: 8   Global Step: 109390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:03:50,604-Speed 2976.81 samples/sec   Loss 8.2513   LearningRate 0.0313   Epoch: 8   Global Step: 109400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:03:54,020-Speed 2997.53 samples/sec   Loss 8.2073   LearningRate 0.0313   Epoch: 8   Global Step: 109410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:03:57,428-Speed 3005.51 samples/sec   Loss 8.2160   LearningRate 0.0313   Epoch: 8   Global Step: 109420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:04:00,810-Speed 3029.24 samples/sec   Loss 8.1156   LearningRate 0.0313   Epoch: 8   Global Step: 109430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:04:04,253-Speed 2974.51 samples/sec   Loss 8.0175   LearningRate 0.0313   Epoch: 8   Global Step: 109440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:04:07,683-Speed 2986.40 samples/sec   Loss 8.2908   LearningRate 0.0313   Epoch: 8   Global Step: 109450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:04:11,073-Speed 3021.51 samples/sec   Loss 8.3230   LearningRate 0.0313   Epoch: 8   Global Step: 109460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:04:14,518-Speed 2972.93 samples/sec   Loss 8.2331   LearningRate 0.0313   Epoch: 8   Global Step: 109470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:04:17,918-Speed 3012.73 samples/sec   Loss 8.0439   LearningRate 0.0313   Epoch: 8   Global Step: 109480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:04:21,329-Speed 3003.55 samples/sec   Loss 8.1279   LearningRate 0.0313   Epoch: 8   Global Step: 109490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:04:24,684-Speed 3052.98 samples/sec   Loss 8.2297   LearningRate 0.0313   Epoch: 8   Global Step: 109500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:04:28,113-Speed 2987.18 samples/sec   Loss 8.0766   LearningRate 0.0313   Epoch: 8   Global Step: 109510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:04:31,482-Speed 3039.90 samples/sec   Loss 8.2408   LearningRate 0.0313   Epoch: 8   Global Step: 109520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:04:34,885-Speed 3010.54 samples/sec   Loss 8.1156   LearningRate 0.0313   Epoch: 8   Global Step: 109530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:04:38,288-Speed 3010.23 samples/sec   Loss 8.1300   LearningRate 0.0313   Epoch: 8   Global Step: 109540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:04:41,736-Speed 2970.18 samples/sec   Loss 8.1475   LearningRate 0.0312   Epoch: 8   Global Step: 109550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:04:45,172-Speed 2981.52 samples/sec   Loss 8.2291   LearningRate 0.0312   Epoch: 8   Global Step: 109560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:04:48,521-Speed 3058.96 samples/sec   Loss 8.0798   LearningRate 0.0312   Epoch: 8   Global Step: 109570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:04:51,884-Speed 3045.12 samples/sec   Loss 8.1797   LearningRate 0.0312   Epoch: 8   Global Step: 109580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:04:55,212-Speed 3078.24 samples/sec   Loss 8.0720   LearningRate 0.0312   Epoch: 8   Global Step: 109590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:04:58,544-Speed 3074.41 samples/sec   Loss 8.1914   LearningRate 0.0312   Epoch: 8   Global Step: 109600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:05:01,955-Speed 3002.67 samples/sec   Loss 8.2060   LearningRate 0.0312   Epoch: 8   Global Step: 109610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:05:05,301-Speed 3061.11 samples/sec   Loss 8.1486   LearningRate 0.0312   Epoch: 8   Global Step: 109620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:05:08,695-Speed 3018.38 samples/sec   Loss 8.0677   LearningRate 0.0312   Epoch: 8   Global Step: 109630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:05:12,058-Speed 3045.07 samples/sec   Loss 8.2022   LearningRate 0.0312   Epoch: 8   Global Step: 109640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:05:15,428-Speed 3039.67 samples/sec   Loss 8.0908   LearningRate 0.0312   Epoch: 8   Global Step: 109650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:05:18,783-Speed 3053.44 samples/sec   Loss 8.1096   LearningRate 0.0312   Epoch: 8   Global Step: 109660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:05:22,186-Speed 3009.43 samples/sec   Loss 8.2518   LearningRate 0.0312   Epoch: 8   Global Step: 109670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:05:25,640-Speed 2965.54 samples/sec   Loss 8.2007   LearningRate 0.0312   Epoch: 8   Global Step: 109680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:05:29,073-Speed 2983.77 samples/sec   Loss 8.2880   LearningRate 0.0312   Epoch: 8   Global Step: 109690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:05:32,397-Speed 3080.85 samples/sec   Loss 8.0758   LearningRate 0.0312   Epoch: 8   Global Step: 109700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:05:35,775-Speed 3032.51 samples/sec   Loss 8.1534   LearningRate 0.0312   Epoch: 8   Global Step: 109710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:05:39,148-Speed 3037.03 samples/sec   Loss 8.2665   LearningRate 0.0312   Epoch: 8   Global Step: 109720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:05:42,481-Speed 3072.38 samples/sec   Loss 8.1720   LearningRate 0.0312   Epoch: 8   Global Step: 109730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:05:45,800-Speed 3086.40 samples/sec   Loss 8.2011   LearningRate 0.0312   Epoch: 8   Global Step: 109740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:05:49,155-Speed 3053.60 samples/sec   Loss 7.9432   LearningRate 0.0312   Epoch: 8   Global Step: 109750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:05:52,531-Speed 3033.48 samples/sec   Loss 8.1967   LearningRate 0.0312   Epoch: 8   Global Step: 109760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:05:55,879-Speed 3059.35 samples/sec   Loss 8.0646   LearningRate 0.0312   Epoch: 8   Global Step: 109770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:05:59,301-Speed 2993.26 samples/sec   Loss 8.0802   LearningRate 0.0311   Epoch: 8   Global Step: 109780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:02,712-Speed 3002.84 samples/sec   Loss 8.0553   LearningRate 0.0311   Epoch: 8   Global Step: 109790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:06,149-Speed 2980.21 samples/sec   Loss 8.1986   LearningRate 0.0311   Epoch: 8   Global Step: 109800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:09,591-Speed 2975.60 samples/sec   Loss 8.1971   LearningRate 0.0311   Epoch: 8   Global Step: 109810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:12,986-Speed 3017.19 samples/sec   Loss 8.1867   LearningRate 0.0311   Epoch: 8   Global Step: 109820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:16,442-Speed 2963.82 samples/sec   Loss 8.2094   LearningRate 0.0311   Epoch: 8   Global Step: 109830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:19,852-Speed 3003.63 samples/sec   Loss 8.0608   LearningRate 0.0311   Epoch: 8   Global Step: 109840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:23,214-Speed 3046.97 samples/sec   Loss 8.1491   LearningRate 0.0311   Epoch: 8   Global Step: 109850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:26,612-Speed 3013.79 samples/sec   Loss 8.3260   LearningRate 0.0311   Epoch: 8   Global Step: 109860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:29,985-Speed 3037.53 samples/sec   Loss 8.1187   LearningRate 0.0311   Epoch: 8   Global Step: 109870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:33,343-Speed 3049.72 samples/sec   Loss 8.1388   LearningRate 0.0311   Epoch: 8   Global Step: 109880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:36,683-Speed 3067.41 samples/sec   Loss 8.1354   LearningRate 0.0311   Epoch: 8   Global Step: 109890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:40,124-Speed 2976.63 samples/sec   Loss 8.1784   LearningRate 0.0311   Epoch: 8   Global Step: 109900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:43,504-Speed 3030.11 samples/sec   Loss 8.1546   LearningRate 0.0311   Epoch: 8   Global Step: 109910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:46,932-Speed 2988.06 samples/sec   Loss 8.1619   LearningRate 0.0311   Epoch: 8   Global Step: 109920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:06:50,344-Speed 3002.52 samples/sec   Loss 8.1707   LearningRate 0.0311   Epoch: 8   Global Step: 109930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:53,766-Speed 2992.68 samples/sec   Loss 8.1945   LearningRate 0.0311   Epoch: 8   Global Step: 109940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:06:57,250-Speed 2940.20 samples/sec   Loss 8.0350   LearningRate 0.0311   Epoch: 8   Global Step: 109950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:00,672-Speed 2992.93 samples/sec   Loss 8.0924   LearningRate 0.0311   Epoch: 8   Global Step: 109960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:04,030-Speed 3050.46 samples/sec   Loss 8.0791   LearningRate 0.0311   Epoch: 8   Global Step: 109970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:07,467-Speed 2979.81 samples/sec   Loss 8.2240   LearningRate 0.0311   Epoch: 8   Global Step: 109980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:10,946-Speed 2944.77 samples/sec   Loss 8.1931   LearningRate 0.0311   Epoch: 8   Global Step: 109990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:14,345-Speed 3013.45 samples/sec   Loss 8.0224   LearningRate 0.0310   Epoch: 8   Global Step: 110000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:17,791-Speed 2972.25 samples/sec   Loss 8.1256   LearningRate 0.0310   Epoch: 8   Global Step: 110010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:21,256-Speed 2955.35 samples/sec   Loss 8.1371   LearningRate 0.0310   Epoch: 8   Global Step: 110020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:24,694-Speed 2979.68 samples/sec   Loss 8.2226   LearningRate 0.0310   Epoch: 8   Global Step: 110030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:07:28,141-Speed 2971.89 samples/sec   Loss 8.0085   LearningRate 0.0310   Epoch: 8   Global Step: 110040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:31,504-Speed 3045.95 samples/sec   Loss 8.1728   LearningRate 0.0310   Epoch: 8   Global Step: 110050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:34,873-Speed 3040.40 samples/sec   Loss 8.1275   LearningRate 0.0310   Epoch: 8   Global Step: 110060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:38,292-Speed 2996.03 samples/sec   Loss 8.2269   LearningRate 0.0310   Epoch: 8   Global Step: 110070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:41,696-Speed 3009.02 samples/sec   Loss 8.0528   LearningRate 0.0310   Epoch: 8   Global Step: 110080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:45,020-Speed 3081.24 samples/sec   Loss 8.1314   LearningRate 0.0310   Epoch: 8   Global Step: 110090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:48,419-Speed 3013.08 samples/sec   Loss 8.0552   LearningRate 0.0310   Epoch: 8   Global Step: 110100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:51,799-Speed 3030.40 samples/sec   Loss 8.1933   LearningRate 0.0310   Epoch: 8   Global Step: 110110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:55,228-Speed 2987.18 samples/sec   Loss 8.1838   LearningRate 0.0310   Epoch: 8   Global Step: 110120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:07:58,673-Speed 2973.33 samples/sec   Loss 8.0770   LearningRate 0.0310   Epoch: 8   Global Step: 110130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:08:02,115-Speed 2976.48 samples/sec   Loss 8.0166   LearningRate 0.0310   Epoch: 8   Global Step: 110140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:08:05,565-Speed 2968.57 samples/sec   Loss 8.0490   LearningRate 0.0310   Epoch: 8   Global Step: 110150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:08:08,924-Speed 3049.68 samples/sec   Loss 7.9407   LearningRate 0.0310   Epoch: 8   Global Step: 110160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:08:12,393-Speed 2952.95 samples/sec   Loss 7.9776   LearningRate 0.0310   Epoch: 8   Global Step: 110170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:08:15,871-Speed 2944.42 samples/sec   Loss 8.0818   LearningRate 0.0310   Epoch: 8   Global Step: 110180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:08:19,249-Speed 3032.78 samples/sec   Loss 8.2472   LearningRate 0.0310   Epoch: 8   Global Step: 110190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:08:22,648-Speed 3013.64 samples/sec   Loss 7.9958   LearningRate 0.0310   Epoch: 8   Global Step: 110200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:08:26,042-Speed 3017.03 samples/sec   Loss 8.1105   LearningRate 0.0310   Epoch: 8   Global Step: 110210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:08:29,427-Speed 3026.81 samples/sec   Loss 8.0934   LearningRate 0.0309   Epoch: 8   Global Step: 110220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:08:32,869-Speed 2975.46 samples/sec   Loss 8.0059   LearningRate 0.0309   Epoch: 8   Global Step: 110230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:08:36,303-Speed 2982.54 samples/sec   Loss 7.9966   LearningRate 0.0309   Epoch: 8   Global Step: 110240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:08:39,701-Speed 3014.56 samples/sec   Loss 8.2429   LearningRate 0.0309   Epoch: 8   Global Step: 110250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:08:43,123-Speed 2993.83 samples/sec   Loss 7.9953   LearningRate 0.0309   Epoch: 8   Global Step: 110260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:08:46,520-Speed 3014.89 samples/sec   Loss 8.0036   LearningRate 0.0309   Epoch: 8   Global Step: 110270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:08:49,929-Speed 3004.53 samples/sec   Loss 8.0708   LearningRate 0.0309   Epoch: 8   Global Step: 110280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:08:53,317-Speed 3023.52 samples/sec   Loss 8.0316   LearningRate 0.0309   Epoch: 8   Global Step: 110290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:08:56,635-Speed 3086.86 samples/sec   Loss 8.1680   LearningRate 0.0309   Epoch: 8   Global Step: 110300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:08:59,998-Speed 3046.02 samples/sec   Loss 8.0713   LearningRate 0.0309   Epoch: 8   Global Step: 110310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:09:03,418-Speed 2995.18 samples/sec   Loss 8.1844   LearningRate 0.0309   Epoch: 8   Global Step: 110320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:09:06,768-Speed 3057.70 samples/sec   Loss 8.0076   LearningRate 0.0309   Epoch: 8   Global Step: 110330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:09:10,156-Speed 3023.41 samples/sec   Loss 8.1270   LearningRate 0.0309   Epoch: 8   Global Step: 110340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:09:13,564-Speed 3005.48 samples/sec   Loss 8.3112   LearningRate 0.0309   Epoch: 8   Global Step: 110350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:09:17,070-Speed 2921.21 samples/sec   Loss 8.0873   LearningRate 0.0309   Epoch: 8   Global Step: 110360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:09:20,497-Speed 2988.80 samples/sec   Loss 8.0790   LearningRate 0.0309   Epoch: 8   Global Step: 110370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:09:23,990-Speed 2932.67 samples/sec   Loss 8.1126   LearningRate 0.0309   Epoch: 8   Global Step: 110380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:09:27,380-Speed 3021.40 samples/sec   Loss 8.1722   LearningRate 0.0309   Epoch: 8   Global Step: 110390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:09:30,767-Speed 3024.76 samples/sec   Loss 8.1042   LearningRate 0.0309   Epoch: 8   Global Step: 110400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:09:34,189-Speed 2992.98 samples/sec   Loss 8.1032   LearningRate 0.0309   Epoch: 8   Global Step: 110410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:09:37,585-Speed 3016.06 samples/sec   Loss 8.1527   LearningRate 0.0309   Epoch: 8   Global Step: 110420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:09:41,014-Speed 2987.67 samples/sec   Loss 8.0768   LearningRate 0.0309   Epoch: 8   Global Step: 110430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:09:44,420-Speed 3007.08 samples/sec   Loss 8.1247   LearningRate 0.0309   Epoch: 8   Global Step: 110440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:09:47,826-Speed 3007.45 samples/sec   Loss 8.2074   LearningRate 0.0308   Epoch: 8   Global Step: 110450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:09:51,222-Speed 3016.10 samples/sec   Loss 8.0500   LearningRate 0.0308   Epoch: 8   Global Step: 110460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:09:54,589-Speed 3041.84 samples/sec   Loss 8.1085   LearningRate 0.0308   Epoch: 8   Global Step: 110470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:09:57,893-Speed 3100.72 samples/sec   Loss 8.0721   LearningRate 0.0308   Epoch: 8   Global Step: 110480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:01,286-Speed 3019.69 samples/sec   Loss 8.0364   LearningRate 0.0308   Epoch: 8   Global Step: 110490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:04,731-Speed 2972.88 samples/sec   Loss 8.0813   LearningRate 0.0308   Epoch: 8   Global Step: 110500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:08,130-Speed 3013.81 samples/sec   Loss 8.1887   LearningRate 0.0308   Epoch: 8   Global Step: 110510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:11,452-Speed 3083.43 samples/sec   Loss 8.2213   LearningRate 0.0308   Epoch: 8   Global Step: 110520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:14,825-Speed 3036.66 samples/sec   Loss 8.1382   LearningRate 0.0308   Epoch: 8   Global Step: 110530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:18,267-Speed 2975.79 samples/sec   Loss 8.0192   LearningRate 0.0308   Epoch: 8   Global Step: 110540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:21,683-Speed 2998.86 samples/sec   Loss 8.0497   LearningRate 0.0308   Epoch: 8   Global Step: 110550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:25,109-Speed 2989.88 samples/sec   Loss 8.0224   LearningRate 0.0308   Epoch: 8   Global Step: 110560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:28,493-Speed 3026.00 samples/sec   Loss 7.9625   LearningRate 0.0308   Epoch: 8   Global Step: 110570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:31,905-Speed 3002.78 samples/sec   Loss 8.0571   LearningRate 0.0308   Epoch: 8   Global Step: 110580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:35,267-Speed 3045.94 samples/sec   Loss 8.1288   LearningRate 0.0308   Epoch: 8   Global Step: 110590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:38,599-Speed 3074.67 samples/sec   Loss 8.0902   LearningRate 0.0308   Epoch: 8   Global Step: 110600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:42,059-Speed 2960.53 samples/sec   Loss 8.1531   LearningRate 0.0308   Epoch: 8   Global Step: 110610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:45,390-Speed 3075.18 samples/sec   Loss 8.0863   LearningRate 0.0308   Epoch: 8   Global Step: 110620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:48,738-Speed 3059.32 samples/sec   Loss 8.0816   LearningRate 0.0308   Epoch: 8   Global Step: 110630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:52,052-Speed 3090.61 samples/sec   Loss 8.0344   LearningRate 0.0308   Epoch: 8   Global Step: 110640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:55,407-Speed 3053.14 samples/sec   Loss 8.0679   LearningRate 0.0308   Epoch: 8   Global Step: 110650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:10:58,737-Speed 3076.00 samples/sec   Loss 8.0655   LearningRate 0.0308   Epoch: 8   Global Step: 110660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:11:02,177-Speed 2977.59 samples/sec   Loss 8.1217   LearningRate 0.0307   Epoch: 8   Global Step: 110670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:11:05,470-Speed 3110.28 samples/sec   Loss 8.0534   LearningRate 0.0307   Epoch: 8   Global Step: 110680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:11:08,804-Speed 3072.50 samples/sec   Loss 8.0304   LearningRate 0.0307   Epoch: 8   Global Step: 110690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:11:12,131-Speed 3078.88 samples/sec   Loss 8.1020   LearningRate 0.0307   Epoch: 8   Global Step: 110700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:11:15,466-Speed 3071.06 samples/sec   Loss 8.0897   LearningRate 0.0307   Epoch: 8   Global Step: 110710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:11:18,885-Speed 2995.81 samples/sec   Loss 8.1095   LearningRate 0.0307   Epoch: 8   Global Step: 110720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:11:22,201-Speed 3088.67 samples/sec   Loss 7.9748   LearningRate 0.0307   Epoch: 8   Global Step: 110730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:11:25,513-Speed 3092.86 samples/sec   Loss 8.1767   LearningRate 0.0307   Epoch: 8   Global Step: 110740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:11:28,826-Speed 3091.25 samples/sec   Loss 8.0675   LearningRate 0.0307   Epoch: 8   Global Step: 110750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:11:32,163-Speed 3071.49 samples/sec   Loss 8.1230   LearningRate 0.0307   Epoch: 8   Global Step: 110760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:11:35,497-Speed 3071.85 samples/sec   Loss 8.0073   LearningRate 0.0307   Epoch: 8   Global Step: 110770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:11:38,784-Speed 3116.51 samples/sec   Loss 8.1620   LearningRate 0.0307   Epoch: 8   Global Step: 110780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:11:42,117-Speed 3073.65 samples/sec   Loss 8.2410   LearningRate 0.0307   Epoch: 8   Global Step: 110790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:11:45,488-Speed 3038.33 samples/sec   Loss 8.0068   LearningRate 0.0307   Epoch: 8   Global Step: 110800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:11:48,854-Speed 3043.41 samples/sec   Loss 7.8945   LearningRate 0.0307   Epoch: 8   Global Step: 110810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:11:52,203-Speed 3058.15 samples/sec   Loss 8.0430   LearningRate 0.0307   Epoch: 8   Global Step: 110820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:11:55,562-Speed 3048.75 samples/sec   Loss 8.0339   LearningRate 0.0307   Epoch: 8   Global Step: 110830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:11:58,934-Speed 3037.61 samples/sec   Loss 8.0478   LearningRate 0.0307   Epoch: 8   Global Step: 110840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:02,262-Speed 3078.25 samples/sec   Loss 8.0693   LearningRate 0.0307   Epoch: 8   Global Step: 110850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:05,620-Speed 3049.97 samples/sec   Loss 8.0855   LearningRate 0.0307   Epoch: 8   Global Step: 110860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:09,045-Speed 2990.80 samples/sec   Loss 8.1603   LearningRate 0.0307   Epoch: 8   Global Step: 110870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:12,455-Speed 3004.53 samples/sec   Loss 8.0106   LearningRate 0.0307   Epoch: 8   Global Step: 110880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:15,889-Speed 2982.20 samples/sec   Loss 8.0984   LearningRate 0.0306   Epoch: 8   Global Step: 110890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:19,312-Speed 2992.14 samples/sec   Loss 8.0774   LearningRate 0.0306   Epoch: 8   Global Step: 110900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:12:22,699-Speed 3024.12 samples/sec   Loss 8.1820   LearningRate 0.0306   Epoch: 8   Global Step: 110910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:12:25,998-Speed 3105.18 samples/sec   Loss 8.0370   LearningRate 0.0306   Epoch: 8   Global Step: 110920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:29,362-Speed 3045.30 samples/sec   Loss 8.0154   LearningRate 0.0306   Epoch: 8   Global Step: 110930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:32,718-Speed 3052.07 samples/sec   Loss 8.0205   LearningRate 0.0306   Epoch: 8   Global Step: 110940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:36,133-Speed 2998.67 samples/sec   Loss 8.1616   LearningRate 0.0306   Epoch: 8   Global Step: 110950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:39,562-Speed 2987.23 samples/sec   Loss 8.1235   LearningRate 0.0306   Epoch: 8   Global Step: 110960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:43,036-Speed 2949.24 samples/sec   Loss 8.1823   LearningRate 0.0306   Epoch: 8   Global Step: 110970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:46,455-Speed 2996.18 samples/sec   Loss 8.1428   LearningRate 0.0306   Epoch: 8   Global Step: 110980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:49,967-Speed 2915.64 samples/sec   Loss 8.0269   LearningRate 0.0306   Epoch: 8   Global Step: 110990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:53,370-Speed 3010.85 samples/sec   Loss 8.0658   LearningRate 0.0306   Epoch: 8   Global Step: 111000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:12:56,793-Speed 2992.13 samples/sec   Loss 7.9771   LearningRate 0.0306   Epoch: 8   Global Step: 111010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:13:00,217-Speed 2991.86 samples/sec   Loss 8.0327   LearningRate 0.0306   Epoch: 8   Global Step: 111020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:13:03,587-Speed 3039.09 samples/sec   Loss 8.0427   LearningRate 0.0306   Epoch: 8   Global Step: 111030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:13:06,926-Speed 3068.18 samples/sec   Loss 8.0107   LearningRate 0.0306   Epoch: 8   Global Step: 111040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:13:10,305-Speed 3031.11 samples/sec   Loss 7.9520   LearningRate 0.0306   Epoch: 8   Global Step: 111050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:13:13,676-Speed 3038.31 samples/sec   Loss 8.1454   LearningRate 0.0306   Epoch: 8   Global Step: 111060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:13:17,037-Speed 3047.57 samples/sec   Loss 8.1288   LearningRate 0.0306   Epoch: 8   Global Step: 111070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:13:20,433-Speed 3016.60 samples/sec   Loss 8.0885   LearningRate 0.0306   Epoch: 8   Global Step: 111080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:13:23,755-Speed 3082.65 samples/sec   Loss 7.9836   LearningRate 0.0306   Epoch: 8   Global Step: 111090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:13:27,162-Speed 3007.09 samples/sec   Loss 8.0026   LearningRate 0.0306   Epoch: 8   Global Step: 111100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:13:30,563-Speed 3011.46 samples/sec   Loss 8.0870   LearningRate 0.0306   Epoch: 8   Global Step: 111110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:13:34,066-Speed 2923.69 samples/sec   Loss 7.8915   LearningRate 0.0305   Epoch: 8   Global Step: 111120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:13:37,414-Speed 3059.08 samples/sec   Loss 8.0934   LearningRate 0.0305   Epoch: 8   Global Step: 111130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:13:40,818-Speed 3009.66 samples/sec   Loss 8.1175   LearningRate 0.0305   Epoch: 8   Global Step: 111140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:13:44,209-Speed 3020.41 samples/sec   Loss 8.1310   LearningRate 0.0305   Epoch: 8   Global Step: 111150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:13:47,574-Speed 3043.88 samples/sec   Loss 8.1166   LearningRate 0.0305   Epoch: 8   Global Step: 111160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:13:51,042-Speed 2953.35 samples/sec   Loss 8.0664   LearningRate 0.0305   Epoch: 8   Global Step: 111170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:13:54,523-Speed 2942.37 samples/sec   Loss 8.1278   LearningRate 0.0305   Epoch: 8   Global Step: 111180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:13:57,970-Speed 2972.03 samples/sec   Loss 8.0791   LearningRate 0.0305   Epoch: 8   Global Step: 111190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:01,403-Speed 2983.39 samples/sec   Loss 7.9122   LearningRate 0.0305   Epoch: 8   Global Step: 111200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:04,774-Speed 3038.60 samples/sec   Loss 8.1115   LearningRate 0.0305   Epoch: 8   Global Step: 111210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:08,160-Speed 3025.04 samples/sec   Loss 8.0942   LearningRate 0.0305   Epoch: 8   Global Step: 111220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:11,516-Speed 3052.67 samples/sec   Loss 8.1465   LearningRate 0.0305   Epoch: 8   Global Step: 111230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:14,843-Speed 3078.01 samples/sec   Loss 8.0661   LearningRate 0.0305   Epoch: 8   Global Step: 111240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:18,249-Speed 3008.03 samples/sec   Loss 8.0275   LearningRate 0.0305   Epoch: 8   Global Step: 111250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:21,601-Speed 3055.66 samples/sec   Loss 8.0060   LearningRate 0.0305   Epoch: 8   Global Step: 111260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:24,992-Speed 3020.21 samples/sec   Loss 8.0931   LearningRate 0.0305   Epoch: 8   Global Step: 111270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:28,381-Speed 3022.79 samples/sec   Loss 7.9818   LearningRate 0.0305   Epoch: 8   Global Step: 111280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:31,800-Speed 2995.62 samples/sec   Loss 8.1519   LearningRate 0.0305   Epoch: 8   Global Step: 111290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:35,151-Speed 3055.99 samples/sec   Loss 8.1278   LearningRate 0.0305   Epoch: 8   Global Step: 111300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:38,573-Speed 2993.78 samples/sec   Loss 8.0172   LearningRate 0.0305   Epoch: 8   Global Step: 111310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:42,025-Speed 2967.75 samples/sec   Loss 7.8189   LearningRate 0.0305   Epoch: 8   Global Step: 111320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:45,489-Speed 2956.87 samples/sec   Loss 8.0960   LearningRate 0.0305   Epoch: 8   Global Step: 111330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:48,818-Speed 3076.48 samples/sec   Loss 8.0732   LearningRate 0.0304   Epoch: 8   Global Step: 111340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:52,183-Speed 3044.46 samples/sec   Loss 8.2099   LearningRate 0.0304   Epoch: 8   Global Step: 111350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:14:55,511-Speed 3077.70 samples/sec   Loss 8.0162   LearningRate 0.0304   Epoch: 8   Global Step: 111360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:14:58,919-Speed 3006.13 samples/sec   Loss 8.0258   LearningRate 0.0304   Epoch: 8   Global Step: 111370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:02,329-Speed 3003.31 samples/sec   Loss 7.9254   LearningRate 0.0304   Epoch: 8   Global Step: 111380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:05,767-Speed 2979.08 samples/sec   Loss 8.0804   LearningRate 0.0304   Epoch: 8   Global Step: 111390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:09,176-Speed 3004.74 samples/sec   Loss 8.0447   LearningRate 0.0304   Epoch: 8   Global Step: 111400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:12,576-Speed 3013.09 samples/sec   Loss 8.0326   LearningRate 0.0304   Epoch: 8   Global Step: 111410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:15,970-Speed 3018.02 samples/sec   Loss 7.9991   LearningRate 0.0304   Epoch: 8   Global Step: 111420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:19,420-Speed 2968.56 samples/sec   Loss 8.0413   LearningRate 0.0304   Epoch: 8   Global Step: 111430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:22,847-Speed 2989.34 samples/sec   Loss 7.9538   LearningRate 0.0304   Epoch: 8   Global Step: 111440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:26,245-Speed 3014.20 samples/sec   Loss 8.0972   LearningRate 0.0304   Epoch: 8   Global Step: 111450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:29,570-Speed 3080.53 samples/sec   Loss 8.0895   LearningRate 0.0304   Epoch: 8   Global Step: 111460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:32,955-Speed 3026.05 samples/sec   Loss 8.1670   LearningRate 0.0304   Epoch: 8   Global Step: 111470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:36,377-Speed 2992.40 samples/sec   Loss 8.1307   LearningRate 0.0304   Epoch: 8   Global Step: 111480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:39,770-Speed 3019.11 samples/sec   Loss 8.0239   LearningRate 0.0304   Epoch: 8   Global Step: 111490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:43,161-Speed 3021.13 samples/sec   Loss 8.0311   LearningRate 0.0304   Epoch: 8   Global Step: 111500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:46,539-Speed 3031.34 samples/sec   Loss 8.0936   LearningRate 0.0304   Epoch: 8   Global Step: 111510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:49,870-Speed 3075.89 samples/sec   Loss 7.9892   LearningRate 0.0304   Epoch: 8   Global Step: 111520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:53,298-Speed 2987.89 samples/sec   Loss 8.0007   LearningRate 0.0304   Epoch: 8   Global Step: 111530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:15:56,678-Speed 3029.82 samples/sec   Loss 7.9842   LearningRate 0.0304   Epoch: 8   Global Step: 111540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:16:00,083-Speed 3009.14 samples/sec   Loss 8.0385   LearningRate 0.0304   Epoch: 8   Global Step: 111550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:16:03,478-Speed 3017.11 samples/sec   Loss 8.0918   LearningRate 0.0304   Epoch: 8   Global Step: 111560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:16:06,835-Speed 3051.26 samples/sec   Loss 7.9576   LearningRate 0.0303   Epoch: 8   Global Step: 111570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:16:10,205-Speed 3039.33 samples/sec   Loss 8.0437   LearningRate 0.0303   Epoch: 8   Global Step: 111580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:16:13,580-Speed 3035.59 samples/sec   Loss 8.1585   LearningRate 0.0303   Epoch: 8   Global Step: 111590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:16:16,909-Speed 3076.59 samples/sec   Loss 7.8537   LearningRate 0.0303   Epoch: 8   Global Step: 111600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:16:20,262-Speed 3054.62 samples/sec   Loss 7.9779   LearningRate 0.0303   Epoch: 8   Global Step: 111610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:16:23,683-Speed 2994.27 samples/sec   Loss 7.9636   LearningRate 0.0303   Epoch: 8   Global Step: 111620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:16:27,173-Speed 2934.50 samples/sec   Loss 7.8973   LearningRate 0.0303   Epoch: 8   Global Step: 111630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:16:30,623-Speed 2969.52 samples/sec   Loss 8.0664   LearningRate 0.0303   Epoch: 8   Global Step: 111640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:16:33,947-Speed 3080.99 samples/sec   Loss 8.1076   LearningRate 0.0303   Epoch: 8   Global Step: 111650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:16:37,272-Speed 3080.55 samples/sec   Loss 8.0786   LearningRate 0.0303   Epoch: 8   Global Step: 111660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:16:40,610-Speed 3068.77 samples/sec   Loss 8.1555   LearningRate 0.0303   Epoch: 8   Global Step: 111670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:16:44,029-Speed 2995.63 samples/sec   Loss 7.8890   LearningRate 0.0303   Epoch: 8   Global Step: 111680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:16:47,346-Speed 3088.41 samples/sec   Loss 7.9437   LearningRate 0.0303   Epoch: 8   Global Step: 111690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:16:50,751-Speed 3009.38 samples/sec   Loss 7.9943   LearningRate 0.0303   Epoch: 8   Global Step: 111700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:16:54,077-Speed 3078.72 samples/sec   Loss 8.0499   LearningRate 0.0303   Epoch: 8   Global Step: 111710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:16:57,385-Speed 3096.55 samples/sec   Loss 7.9863   LearningRate 0.0303   Epoch: 8   Global Step: 111720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:17:00,783-Speed 3014.17 samples/sec   Loss 8.0515   LearningRate 0.0303   Epoch: 8   Global Step: 111730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:17:04,174-Speed 3021.16 samples/sec   Loss 7.9340   LearningRate 0.0303   Epoch: 8   Global Step: 111740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:17:07,619-Speed 2972.59 samples/sec   Loss 7.8249   LearningRate 0.0303   Epoch: 8   Global Step: 111750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:17:11,090-Speed 2951.71 samples/sec   Loss 8.0300   LearningRate 0.0303   Epoch: 8   Global Step: 111760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:17:14,501-Speed 3003.02 samples/sec   Loss 7.9893   LearningRate 0.0303   Epoch: 8   Global Step: 111770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:17:18,178-Speed 2785.43 samples/sec   Loss 8.0795   LearningRate 0.0303   Epoch: 8   Global Step: 111780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:17:50,148-Speed 320.31 samples/sec   Loss 7.9396   LearningRate 0.0302   Epoch: 9   Global Step: 111790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:17:53,635-Speed 2937.50 samples/sec   Loss 6.4272   LearningRate 0.0302   Epoch: 9   Global Step: 111800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:17:57,382-Speed 2735.05 samples/sec   Loss 6.5976   LearningRate 0.0302   Epoch: 9   Global Step: 111810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:18:00,821-Speed 2978.19 samples/sec   Loss 6.4675   LearningRate 0.0302   Epoch: 9   Global Step: 111820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:18:04,228-Speed 3006.77 samples/sec   Loss 6.5378   LearningRate 0.0302   Epoch: 9   Global Step: 111830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:18:07,727-Speed 2927.79 samples/sec   Loss 6.4939   LearningRate 0.0302   Epoch: 9   Global Step: 111840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:18:11,404-Speed 2785.68 samples/sec   Loss 6.7012   LearningRate 0.0302   Epoch: 9   Global Step: 111850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:18:14,806-Speed 3010.29 samples/sec   Loss 6.5154   LearningRate 0.0302   Epoch: 9   Global Step: 111860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:18:18,190-Speed 3026.92 samples/sec   Loss 6.3353   LearningRate 0.0302   Epoch: 9   Global Step: 111870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:18:21,617-Speed 2989.91 samples/sec   Loss 6.6473   LearningRate 0.0302   Epoch: 9   Global Step: 111880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:18:24,937-Speed 3085.11 samples/sec   Loss 6.6460   LearningRate 0.0302   Epoch: 9   Global Step: 111890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:18:28,430-Speed 2932.28 samples/sec   Loss 6.6492   LearningRate 0.0302   Epoch: 9   Global Step: 111900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:18:31,841-Speed 3003.39 samples/sec   Loss 6.4821   LearningRate 0.0302   Epoch: 9   Global Step: 111910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:18:35,216-Speed 3034.68 samples/sec   Loss 6.4642   LearningRate 0.0302   Epoch: 9   Global Step: 111920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:18:38,718-Speed 2924.51 samples/sec   Loss 6.5693   LearningRate 0.0302   Epoch: 9   Global Step: 111930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:18:42,133-Speed 3000.34 samples/sec   Loss 6.4288   LearningRate 0.0302   Epoch: 9   Global Step: 111940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:18:45,562-Speed 2986.23 samples/sec   Loss 6.5631   LearningRate 0.0302   Epoch: 9   Global Step: 111950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:18:48,970-Speed 3005.98 samples/sec   Loss 6.5111   LearningRate 0.0302   Epoch: 9   Global Step: 111960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:18:52,438-Speed 2953.79 samples/sec   Loss 6.5285   LearningRate 0.0302   Epoch: 9   Global Step: 111970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:18:55,855-Speed 2997.40 samples/sec   Loss 6.6465   LearningRate 0.0302   Epoch: 9   Global Step: 111980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:18:59,179-Speed 3081.67 samples/sec   Loss 6.4550   LearningRate 0.0302   Epoch: 9   Global Step: 111990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:19:02,523-Speed 3063.09 samples/sec   Loss 6.5402   LearningRate 0.0302   Epoch: 9   Global Step: 112000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:19:05,879-Speed 3052.04 samples/sec   Loss 6.4757   LearningRate 0.0302   Epoch: 9   Global Step: 112010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:19:09,320-Speed 2976.79 samples/sec   Loss 6.5689   LearningRate 0.0301   Epoch: 9   Global Step: 112020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:19:12,738-Speed 2998.15 samples/sec   Loss 6.5343   LearningRate 0.0301   Epoch: 9   Global Step: 112030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:19:16,171-Speed 2983.52 samples/sec   Loss 6.6244   LearningRate 0.0301   Epoch: 9   Global Step: 112040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:19:19,566-Speed 3016.98 samples/sec   Loss 6.5650   LearningRate 0.0301   Epoch: 9   Global Step: 112050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:19:22,935-Speed 3040.51 samples/sec   Loss 6.6300   LearningRate 0.0301   Epoch: 9   Global Step: 112060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:19:26,286-Speed 3057.69 samples/sec   Loss 6.7034   LearningRate 0.0301   Epoch: 9   Global Step: 112070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:19:29,698-Speed 3002.29 samples/sec   Loss 6.6716   LearningRate 0.0301   Epoch: 9   Global Step: 112080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:19:33,073-Speed 3035.00 samples/sec   Loss 6.8011   LearningRate 0.0301   Epoch: 9   Global Step: 112090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:19:36,696-Speed 2827.15 samples/sec   Loss 6.7745   LearningRate 0.0301   Epoch: 9   Global Step: 112100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:19:40,135-Speed 2978.66 samples/sec   Loss 6.6185   LearningRate 0.0301   Epoch: 9   Global Step: 112110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:19:43,490-Speed 3052.84 samples/sec   Loss 6.6035   LearningRate 0.0301   Epoch: 9   Global Step: 112120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:19:46,825-Speed 3071.27 samples/sec   Loss 6.6305   LearningRate 0.0301   Epoch: 9   Global Step: 112130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:19:50,192-Speed 3042.63 samples/sec   Loss 6.7928   LearningRate 0.0301   Epoch: 9   Global Step: 112140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:19:53,567-Speed 3034.46 samples/sec   Loss 6.6154   LearningRate 0.0301   Epoch: 9   Global Step: 112150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:19:56,928-Speed 3047.83 samples/sec   Loss 6.8104   LearningRate 0.0301   Epoch: 9   Global Step: 112160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:00,374-Speed 2972.77 samples/sec   Loss 6.6267   LearningRate 0.0301   Epoch: 9   Global Step: 112170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:03,820-Speed 2972.48 samples/sec   Loss 6.7334   LearningRate 0.0301   Epoch: 9   Global Step: 112180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:07,201-Speed 3029.16 samples/sec   Loss 6.6279   LearningRate 0.0301   Epoch: 9   Global Step: 112190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:10,554-Speed 3055.80 samples/sec   Loss 6.6329   LearningRate 0.0301   Epoch: 9   Global Step: 112200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:13,947-Speed 3018.21 samples/sec   Loss 6.7617   LearningRate 0.0301   Epoch: 9   Global Step: 112210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:17,303-Speed 3052.42 samples/sec   Loss 6.7775   LearningRate 0.0301   Epoch: 9   Global Step: 112220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:20,639-Speed 3070.99 samples/sec   Loss 6.7726   LearningRate 0.0301   Epoch: 9   Global Step: 112230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:24,083-Speed 2973.77 samples/sec   Loss 6.7819   LearningRate 0.0301   Epoch: 9   Global Step: 112240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:27,445-Speed 3046.42 samples/sec   Loss 6.6077   LearningRate 0.0300   Epoch: 9   Global Step: 112250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:30,800-Speed 3053.21 samples/sec   Loss 6.6535   LearningRate 0.0300   Epoch: 9   Global Step: 112260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:34,183-Speed 3027.96 samples/sec   Loss 6.8600   LearningRate 0.0300   Epoch: 9   Global Step: 112270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:37,512-Speed 3076.59 samples/sec   Loss 6.8256   LearningRate 0.0300   Epoch: 9   Global Step: 112280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:40,890-Speed 3032.75 samples/sec   Loss 6.6725   LearningRate 0.0300   Epoch: 9   Global Step: 112290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:44,230-Speed 3066.65 samples/sec   Loss 6.7984   LearningRate 0.0300   Epoch: 9   Global Step: 112300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:47,575-Speed 3062.37 samples/sec   Loss 6.7176   LearningRate 0.0300   Epoch: 9   Global Step: 112310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:50,987-Speed 3001.71 samples/sec   Loss 6.7263   LearningRate 0.0300   Epoch: 9   Global Step: 112320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:54,380-Speed 3018.88 samples/sec   Loss 6.7198   LearningRate 0.0300   Epoch: 9   Global Step: 112330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:20:57,758-Speed 3032.74 samples/sec   Loss 6.7502   LearningRate 0.0300   Epoch: 9   Global Step: 112340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:01,169-Speed 3002.68 samples/sec   Loss 6.7038   LearningRate 0.0300   Epoch: 9   Global Step: 112350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:04,485-Speed 3089.42 samples/sec   Loss 6.8808   LearningRate 0.0300   Epoch: 9   Global Step: 112360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:07,915-Speed 2986.51 samples/sec   Loss 6.8269   LearningRate 0.0300   Epoch: 9   Global Step: 112370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:11,322-Speed 3006.79 samples/sec   Loss 6.6608   LearningRate 0.0300   Epoch: 9   Global Step: 112380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:14,730-Speed 3005.48 samples/sec   Loss 6.7673   LearningRate 0.0300   Epoch: 9   Global Step: 112390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:18,182-Speed 2966.96 samples/sec   Loss 6.7637   LearningRate 0.0300   Epoch: 9   Global Step: 112400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:21,645-Speed 2958.38 samples/sec   Loss 6.8654   LearningRate 0.0300   Epoch: 9   Global Step: 112410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:25,080-Speed 2981.60 samples/sec   Loss 6.7682   LearningRate 0.0300   Epoch: 9   Global Step: 112420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:28,391-Speed 3094.00 samples/sec   Loss 6.6873   LearningRate 0.0300   Epoch: 9   Global Step: 112430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:31,793-Speed 3011.35 samples/sec   Loss 6.8076   LearningRate 0.0300   Epoch: 9   Global Step: 112440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:35,180-Speed 3024.00 samples/sec   Loss 6.9556   LearningRate 0.0300   Epoch: 9   Global Step: 112450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:38,523-Speed 3064.37 samples/sec   Loss 6.8125   LearningRate 0.0300   Epoch: 9   Global Step: 112460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:21:41,900-Speed 3033.62 samples/sec   Loss 6.9509   LearningRate 0.0299   Epoch: 9   Global Step: 112470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:45,254-Speed 3053.83 samples/sec   Loss 6.8179   LearningRate 0.0299   Epoch: 9   Global Step: 112480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:48,678-Speed 2991.30 samples/sec   Loss 6.9104   LearningRate 0.0299   Epoch: 9   Global Step: 112490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:52,061-Speed 3028.27 samples/sec   Loss 6.9144   LearningRate 0.0299   Epoch: 9   Global Step: 112500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:55,420-Speed 3048.48 samples/sec   Loss 6.7451   LearningRate 0.0299   Epoch: 9   Global Step: 112510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:21:58,746-Speed 3080.19 samples/sec   Loss 6.7879   LearningRate 0.0299   Epoch: 9   Global Step: 112520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:22:02,094-Speed 3059.15 samples/sec   Loss 6.7976   LearningRate 0.0299   Epoch: 9   Global Step: 112530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:22:05,522-Speed 2996.76 samples/sec   Loss 6.8640   LearningRate 0.0299   Epoch: 9   Global Step: 112540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:22:08,890-Speed 3041.89 samples/sec   Loss 6.9808   LearningRate 0.0299   Epoch: 9   Global Step: 112550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:22:12,203-Speed 3091.56 samples/sec   Loss 6.8999   LearningRate 0.0299   Epoch: 9   Global Step: 112560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:22:15,559-Speed 3051.63 samples/sec   Loss 6.9137   LearningRate 0.0299   Epoch: 9   Global Step: 112570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:22:18,951-Speed 3019.93 samples/sec   Loss 6.9075   LearningRate 0.0299   Epoch: 9   Global Step: 112580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:22:22,335-Speed 3026.63 samples/sec   Loss 6.8338   LearningRate 0.0299   Epoch: 9   Global Step: 112590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:22:25,693-Speed 3050.78 samples/sec   Loss 6.9488   LearningRate 0.0299   Epoch: 9   Global Step: 112600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:22:29,079-Speed 3024.72 samples/sec   Loss 6.9399   LearningRate 0.0299   Epoch: 9   Global Step: 112610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:22:32,557-Speed 2945.57 samples/sec   Loss 6.8711   LearningRate 0.0299   Epoch: 9   Global Step: 112620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:22:35,884-Speed 3078.82 samples/sec   Loss 6.8919   LearningRate 0.0299   Epoch: 9   Global Step: 112630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:22:39,221-Speed 3069.61 samples/sec   Loss 6.9398   LearningRate 0.0299   Epoch: 9   Global Step: 112640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:22:42,587-Speed 3043.21 samples/sec   Loss 6.7586   LearningRate 0.0299   Epoch: 9   Global Step: 112650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:22:45,939-Speed 3054.85 samples/sec   Loss 6.8874   LearningRate 0.0299   Epoch: 9   Global Step: 112660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:22:49,309-Speed 3040.27 samples/sec   Loss 6.9959   LearningRate 0.0299   Epoch: 9   Global Step: 112670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:22:52,696-Speed 3024.04 samples/sec   Loss 6.9910   LearningRate 0.0299   Epoch: 9   Global Step: 112680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:22:56,075-Speed 3031.00 samples/sec   Loss 6.8797   LearningRate 0.0299   Epoch: 9   Global Step: 112690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:22:59,420-Speed 3062.83 samples/sec   Loss 6.9339   LearningRate 0.0298   Epoch: 9   Global Step: 112700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:23:02,750-Speed 3076.13 samples/sec   Loss 6.8918   LearningRate 0.0298   Epoch: 9   Global Step: 112710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:23:06,096-Speed 3060.77 samples/sec   Loss 6.9017   LearningRate 0.0298   Epoch: 9   Global Step: 112720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:23:09,419-Speed 3083.11 samples/sec   Loss 6.9787   LearningRate 0.0298   Epoch: 9   Global Step: 112730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:23:12,882-Speed 2957.84 samples/sec   Loss 7.0670   LearningRate 0.0298   Epoch: 9   Global Step: 112740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:23:16,243-Speed 3047.44 samples/sec   Loss 7.0803   LearningRate 0.0298   Epoch: 9   Global Step: 112750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:23:19,678-Speed 2981.58 samples/sec   Loss 6.9897   LearningRate 0.0298   Epoch: 9   Global Step: 112760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:23:23,074-Speed 3016.36 samples/sec   Loss 6.9837   LearningRate 0.0298   Epoch: 9   Global Step: 112770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:23:26,417-Speed 3064.17 samples/sec   Loss 6.8802   LearningRate 0.0298   Epoch: 9   Global Step: 112780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:23:29,764-Speed 3059.99 samples/sec   Loss 7.0078   LearningRate 0.0298   Epoch: 9   Global Step: 112790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:23:33,136-Speed 3037.44 samples/sec   Loss 6.9307   LearningRate 0.0298   Epoch: 9   Global Step: 112800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:23:36,542-Speed 3007.93 samples/sec   Loss 7.0503   LearningRate 0.0298   Epoch: 9   Global Step: 112810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:23:40,018-Speed 2946.19 samples/sec   Loss 7.0000   LearningRate 0.0298   Epoch: 9   Global Step: 112820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:23:43,427-Speed 3004.80 samples/sec   Loss 6.9267   LearningRate 0.0298   Epoch: 9   Global Step: 112830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:23:46,834-Speed 3006.43 samples/sec   Loss 6.9228   LearningRate 0.0298   Epoch: 9   Global Step: 112840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:23:50,249-Speed 2999.84 samples/sec   Loss 7.0665   LearningRate 0.0298   Epoch: 9   Global Step: 112850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:23:53,697-Speed 2970.54 samples/sec   Loss 6.9871   LearningRate 0.0298   Epoch: 9   Global Step: 112860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:23:57,150-Speed 2966.28 samples/sec   Loss 6.9001   LearningRate 0.0298   Epoch: 9   Global Step: 112870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:00,571-Speed 2994.26 samples/sec   Loss 7.0508   LearningRate 0.0298   Epoch: 9   Global Step: 112880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:03,955-Speed 3027.18 samples/sec   Loss 6.9928   LearningRate 0.0298   Epoch: 9   Global Step: 112890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:07,333-Speed 3032.35 samples/sec   Loss 6.8324   LearningRate 0.0298   Epoch: 9   Global Step: 112900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:10,823-Speed 2935.53 samples/sec   Loss 6.9922   LearningRate 0.0298   Epoch: 9   Global Step: 112910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:14,162-Speed 3067.28 samples/sec   Loss 7.0856   LearningRate 0.0298   Epoch: 9   Global Step: 112920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:17,503-Speed 3066.16 samples/sec   Loss 7.0554   LearningRate 0.0297   Epoch: 9   Global Step: 112930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:20,928-Speed 2990.16 samples/sec   Loss 6.9404   LearningRate 0.0297   Epoch: 9   Global Step: 112940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:24,276-Speed 3059.79 samples/sec   Loss 7.0594   LearningRate 0.0297   Epoch: 9   Global Step: 112950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:27,632-Speed 3052.32 samples/sec   Loss 6.9880   LearningRate 0.0297   Epoch: 9   Global Step: 112960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:31,065-Speed 2983.00 samples/sec   Loss 7.0597   LearningRate 0.0297   Epoch: 9   Global Step: 112970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:24:34,442-Speed 3033.81 samples/sec   Loss 7.0925   LearningRate 0.0297   Epoch: 9   Global Step: 112980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:24:37,814-Speed 3037.36 samples/sec   Loss 6.9986   LearningRate 0.0297   Epoch: 9   Global Step: 112990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:24:41,160-Speed 3061.47 samples/sec   Loss 6.9859   LearningRate 0.0297   Epoch: 9   Global Step: 113000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:44,535-Speed 3035.23 samples/sec   Loss 7.0399   LearningRate 0.0297   Epoch: 9   Global Step: 113010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:47,853-Speed 3087.58 samples/sec   Loss 7.1107   LearningRate 0.0297   Epoch: 9   Global Step: 113020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:51,202-Speed 3058.35 samples/sec   Loss 7.0566   LearningRate 0.0297   Epoch: 9   Global Step: 113030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:54,539-Speed 3069.84 samples/sec   Loss 7.1117   LearningRate 0.0297   Epoch: 9   Global Step: 113040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:24:57,978-Speed 2978.48 samples/sec   Loss 7.1408   LearningRate 0.0297   Epoch: 9   Global Step: 113050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:01,397-Speed 2996.25 samples/sec   Loss 7.0490   LearningRate 0.0297   Epoch: 9   Global Step: 113060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:04,730-Speed 3072.63 samples/sec   Loss 7.1159   LearningRate 0.0297   Epoch: 9   Global Step: 113070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:08,158-Speed 2988.29 samples/sec   Loss 7.1159   LearningRate 0.0297   Epoch: 9   Global Step: 113080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:11,522-Speed 3046.13 samples/sec   Loss 7.0669   LearningRate 0.0297   Epoch: 9   Global Step: 113090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:14,950-Speed 2987.94 samples/sec   Loss 7.1646   LearningRate 0.0297   Epoch: 9   Global Step: 113100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:25:18,278-Speed 3078.31 samples/sec   Loss 7.1250   LearningRate 0.0297   Epoch: 9   Global Step: 113110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:21,644-Speed 3043.03 samples/sec   Loss 7.0452   LearningRate 0.0297   Epoch: 9   Global Step: 113120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:25,086-Speed 2975.60 samples/sec   Loss 6.9864   LearningRate 0.0297   Epoch: 9   Global Step: 113130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:28,507-Speed 2994.17 samples/sec   Loss 7.0119   LearningRate 0.0297   Epoch: 9   Global Step: 113140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:31,907-Speed 3013.39 samples/sec   Loss 7.1296   LearningRate 0.0297   Epoch: 9   Global Step: 113150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:35,237-Speed 3075.89 samples/sec   Loss 7.0273   LearningRate 0.0296   Epoch: 9   Global Step: 113160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:38,597-Speed 3048.71 samples/sec   Loss 7.0722   LearningRate 0.0296   Epoch: 9   Global Step: 113170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:41,933-Speed 3070.43 samples/sec   Loss 7.1007   LearningRate 0.0296   Epoch: 9   Global Step: 113180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:45,324-Speed 3020.16 samples/sec   Loss 6.9916   LearningRate 0.0296   Epoch: 9   Global Step: 113190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:48,712-Speed 3023.59 samples/sec   Loss 7.0803   LearningRate 0.0296   Epoch: 9   Global Step: 113200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:25:52,038-Speed 3079.52 samples/sec   Loss 7.1766   LearningRate 0.0296   Epoch: 9   Global Step: 113210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:25:55,360-Speed 3083.44 samples/sec   Loss 7.1478   LearningRate 0.0296   Epoch: 9   Global Step: 113220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:25:58,736-Speed 3034.25 samples/sec   Loss 7.0586   LearningRate 0.0296   Epoch: 9   Global Step: 113230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:26:02,078-Speed 3065.10 samples/sec   Loss 7.1072   LearningRate 0.0296   Epoch: 9   Global Step: 113240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:26:05,499-Speed 2993.99 samples/sec   Loss 7.0758   LearningRate 0.0296   Epoch: 9   Global Step: 113250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:26:08,916-Speed 2998.06 samples/sec   Loss 7.1129   LearningRate 0.0296   Epoch: 9   Global Step: 113260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:26:12,310-Speed 3017.63 samples/sec   Loss 7.2428   LearningRate 0.0296   Epoch: 9   Global Step: 113270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:26:15,679-Speed 3039.95 samples/sec   Loss 7.0671   LearningRate 0.0296   Epoch: 9   Global Step: 113280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:26:19,080-Speed 3012.08 samples/sec   Loss 7.1938   LearningRate 0.0296   Epoch: 9   Global Step: 113290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:26:22,534-Speed 2965.56 samples/sec   Loss 7.1532   LearningRate 0.0296   Epoch: 9   Global Step: 113300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:26:25,930-Speed 3016.08 samples/sec   Loss 7.2318   LearningRate 0.0296   Epoch: 9   Global Step: 113310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:26:29,317-Speed 3024.48 samples/sec   Loss 7.0883   LearningRate 0.0296   Epoch: 9   Global Step: 113320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:26:32,766-Speed 2969.84 samples/sec   Loss 7.0965   LearningRate 0.0296   Epoch: 9   Global Step: 113330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:26:36,191-Speed 2989.83 samples/sec   Loss 7.0632   LearningRate 0.0296   Epoch: 9   Global Step: 113340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:26:39,534-Speed 3064.42 samples/sec   Loss 7.1352   LearningRate 0.0296   Epoch: 9   Global Step: 113350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:26:42,962-Speed 2988.24 samples/sec   Loss 7.2351   LearningRate 0.0296   Epoch: 9   Global Step: 113360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:26:46,348-Speed 3025.39 samples/sec   Loss 7.1162   LearningRate 0.0296   Epoch: 9   Global Step: 113370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:26:49,711-Speed 3045.26 samples/sec   Loss 7.1663   LearningRate 0.0295   Epoch: 9   Global Step: 113380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:26:53,142-Speed 2985.80 samples/sec   Loss 7.0081   LearningRate 0.0295   Epoch: 9   Global Step: 113390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:26:56,509-Speed 3041.83 samples/sec   Loss 7.0612   LearningRate 0.0295   Epoch: 9   Global Step: 113400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:26:59,888-Speed 3031.06 samples/sec   Loss 7.1036   LearningRate 0.0295   Epoch: 9   Global Step: 113410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:27:03,266-Speed 3033.32 samples/sec   Loss 7.1947   LearningRate 0.0295   Epoch: 9   Global Step: 113420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:27:06,664-Speed 3013.53 samples/sec   Loss 7.1838   LearningRate 0.0295   Epoch: 9   Global Step: 113430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:27:10,091-Speed 2988.84 samples/sec   Loss 7.1192   LearningRate 0.0295   Epoch: 9   Global Step: 113440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:27:13,550-Speed 2962.36 samples/sec   Loss 7.1569   LearningRate 0.0295   Epoch: 9   Global Step: 113450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:27:16,946-Speed 3016.56 samples/sec   Loss 7.1310   LearningRate 0.0295   Epoch: 9   Global Step: 113460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:27:20,295-Speed 3058.48 samples/sec   Loss 7.1263   LearningRate 0.0295   Epoch: 9   Global Step: 113470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:27:23,730-Speed 2982.52 samples/sec   Loss 7.3420   LearningRate 0.0295   Epoch: 9   Global Step: 113480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:27:27,109-Speed 3031.12 samples/sec   Loss 7.2285   LearningRate 0.0295   Epoch: 9   Global Step: 113490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:27:30,576-Speed 2954.20 samples/sec   Loss 7.3348   LearningRate 0.0295   Epoch: 9   Global Step: 113500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:27:34,018-Speed 2975.39 samples/sec   Loss 7.1156   LearningRate 0.0295   Epoch: 9   Global Step: 113510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:27:37,435-Speed 2997.74 samples/sec   Loss 7.1829   LearningRate 0.0295   Epoch: 9   Global Step: 113520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:27:40,771-Speed 3070.56 samples/sec   Loss 7.2055   LearningRate 0.0295   Epoch: 9   Global Step: 113530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:27:44,197-Speed 2989.80 samples/sec   Loss 7.1175   LearningRate 0.0295   Epoch: 9   Global Step: 113540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:27:47,517-Speed 3085.72 samples/sec   Loss 7.0734   LearningRate 0.0295   Epoch: 9   Global Step: 113550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:27:50,847-Speed 3075.70 samples/sec   Loss 7.1051   LearningRate 0.0295   Epoch: 9   Global Step: 113560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:27:54,236-Speed 3022.88 samples/sec   Loss 7.2508   LearningRate 0.0295   Epoch: 9   Global Step: 113570   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:27:57,605-Speed 3040.46 samples/sec   Loss 7.2260   LearningRate 0.0295   Epoch: 9   Global Step: 113580   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:28:00,954-Speed 3058.81 samples/sec   Loss 7.2640   LearningRate 0.0295   Epoch: 9   Global Step: 113590   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:28:04,326-Speed 3037.31 samples/sec   Loss 7.1846   LearningRate 0.0295   Epoch: 9   Global Step: 113600   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:28:07,716-Speed 3021.69 samples/sec   Loss 7.2019   LearningRate 0.0294   Epoch: 9   Global Step: 113610   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:28:11,113-Speed 3015.43 samples/sec   Loss 7.3244   LearningRate 0.0294   Epoch: 9   Global Step: 113620   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:28:14,478-Speed 3043.87 samples/sec   Loss 7.2249   LearningRate 0.0294   Epoch: 9   Global Step: 113630   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:28:17,870-Speed 3019.50 samples/sec   Loss 7.1850   LearningRate 0.0294   Epoch: 9   Global Step: 113640   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:28:21,308-Speed 2979.57 samples/sec   Loss 7.2390   LearningRate 0.0294   Epoch: 9   Global Step: 113650   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:28:24,665-Speed 3050.91 samples/sec   Loss 7.3730   LearningRate 0.0294   Epoch: 9   Global Step: 113660   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:28:28,074-Speed 3004.78 samples/sec   Loss 7.1443   LearningRate 0.0294   Epoch: 9   Global Step: 113670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:28:31,454-Speed 3030.90 samples/sec   Loss 7.3553   LearningRate 0.0294   Epoch: 9   Global Step: 113680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:28:34,896-Speed 2975.67 samples/sec   Loss 7.2832   LearningRate 0.0294   Epoch: 9   Global Step: 113690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:28:38,247-Speed 3057.03 samples/sec   Loss 7.2886   LearningRate 0.0294   Epoch: 9   Global Step: 113700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:28:41,654-Speed 3006.32 samples/sec   Loss 7.2738   LearningRate 0.0294   Epoch: 9   Global Step: 113710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:28:45,021-Speed 3041.92 samples/sec   Loss 7.1856   LearningRate 0.0294   Epoch: 9   Global Step: 113720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:28:48,431-Speed 3004.12 samples/sec   Loss 7.2699   LearningRate 0.0294   Epoch: 9   Global Step: 113730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:28:51,806-Speed 3034.76 samples/sec   Loss 7.1124   LearningRate 0.0294   Epoch: 9   Global Step: 113740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:28:55,163-Speed 3051.50 samples/sec   Loss 7.2789   LearningRate 0.0294   Epoch: 9   Global Step: 113750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:28:58,558-Speed 3016.83 samples/sec   Loss 7.2744   LearningRate 0.0294   Epoch: 9   Global Step: 113760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:29:01,981-Speed 2993.23 samples/sec   Loss 7.3472   LearningRate 0.0294   Epoch: 9   Global Step: 113770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:29:05,359-Speed 3032.23 samples/sec   Loss 7.2867   LearningRate 0.0294   Epoch: 9   Global Step: 113780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:29:08,763-Speed 3008.38 samples/sec   Loss 7.2985   LearningRate 0.0294   Epoch: 9   Global Step: 113790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:29:12,118-Speed 3053.98 samples/sec   Loss 7.2076   LearningRate 0.0294   Epoch: 9   Global Step: 113800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:29:15,492-Speed 3035.83 samples/sec   Loss 7.3776   LearningRate 0.0294   Epoch: 9   Global Step: 113810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:29:18,883-Speed 3020.79 samples/sec   Loss 7.2823   LearningRate 0.0294   Epoch: 9   Global Step: 113820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:29:22,336-Speed 2966.14 samples/sec   Loss 7.2851   LearningRate 0.0294   Epoch: 9   Global Step: 113830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:29:25,696-Speed 3048.39 samples/sec   Loss 7.2086   LearningRate 0.0293   Epoch: 9   Global Step: 113840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:29:29,095-Speed 3013.80 samples/sec   Loss 7.2723   LearningRate 0.0293   Epoch: 9   Global Step: 113850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:29:32,538-Speed 2975.01 samples/sec   Loss 7.2963   LearningRate 0.0293   Epoch: 9   Global Step: 113860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:29:35,934-Speed 3015.70 samples/sec   Loss 7.2096   LearningRate 0.0293   Epoch: 9   Global Step: 113870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:29:39,388-Speed 2965.47 samples/sec   Loss 7.3285   LearningRate 0.0293   Epoch: 9   Global Step: 113880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:29:42,755-Speed 3043.06 samples/sec   Loss 7.3431   LearningRate 0.0293   Epoch: 9   Global Step: 113890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:29:46,095-Speed 3066.21 samples/sec   Loss 7.3955   LearningRate 0.0293   Epoch: 9   Global Step: 113900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:29:49,488-Speed 3019.10 samples/sec   Loss 7.2447   LearningRate 0.0293   Epoch: 9   Global Step: 113910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:29:52,977-Speed 2935.90 samples/sec   Loss 7.3232   LearningRate 0.0293   Epoch: 9   Global Step: 113920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:29:56,373-Speed 3015.54 samples/sec   Loss 7.1996   LearningRate 0.0293   Epoch: 9   Global Step: 113930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:29:59,785-Speed 3002.34 samples/sec   Loss 7.3209   LearningRate 0.0293   Epoch: 9   Global Step: 113940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:30:03,229-Speed 2975.26 samples/sec   Loss 7.4672   LearningRate 0.0293   Epoch: 9   Global Step: 113950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:30:06,592-Speed 3045.71 samples/sec   Loss 7.3792   LearningRate 0.0293   Epoch: 9   Global Step: 113960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:30:10,023-Speed 2985.62 samples/sec   Loss 7.3477   LearningRate 0.0293   Epoch: 9   Global Step: 113970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:30:13,400-Speed 3032.97 samples/sec   Loss 7.3796   LearningRate 0.0293   Epoch: 9   Global Step: 113980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:30:16,800-Speed 3013.32 samples/sec   Loss 7.2766   LearningRate 0.0293   Epoch: 9   Global Step: 113990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:30:20,223-Speed 2992.03 samples/sec   Loss 7.3621   LearningRate 0.0293   Epoch: 9   Global Step: 114000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:30:23,600-Speed 3033.60 samples/sec   Loss 7.2371   LearningRate 0.0293   Epoch: 9   Global Step: 114010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:30:26,996-Speed 3015.49 samples/sec   Loss 7.2809   LearningRate 0.0293   Epoch: 9   Global Step: 114020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:30:30,423-Speed 2989.66 samples/sec   Loss 7.2994   LearningRate 0.0293   Epoch: 9   Global Step: 114030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:30:33,814-Speed 3019.93 samples/sec   Loss 7.4804   LearningRate 0.0293   Epoch: 9   Global Step: 114040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:30:37,212-Speed 3014.98 samples/sec   Loss 7.4406   LearningRate 0.0293   Epoch: 9   Global Step: 114050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:30:40,669-Speed 2962.63 samples/sec   Loss 7.3907   LearningRate 0.0293   Epoch: 9   Global Step: 114060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:30:44,044-Speed 3034.94 samples/sec   Loss 7.2428   LearningRate 0.0292   Epoch: 9   Global Step: 114070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:30:47,553-Speed 2919.61 samples/sec   Loss 7.3193   LearningRate 0.0292   Epoch: 9   Global Step: 114080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:30:50,904-Speed 3056.82 samples/sec   Loss 7.3994   LearningRate 0.0292   Epoch: 9   Global Step: 114090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:30:54,359-Speed 2963.91 samples/sec   Loss 7.3426   LearningRate 0.0292   Epoch: 9   Global Step: 114100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:30:57,756-Speed 3015.27 samples/sec   Loss 7.3545   LearningRate 0.0292   Epoch: 9   Global Step: 114110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:31:01,197-Speed 2977.25 samples/sec   Loss 7.4623   LearningRate 0.0292   Epoch: 9   Global Step: 114120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:31:04,663-Speed 2954.72 samples/sec   Loss 7.2393   LearningRate 0.0292   Epoch: 9   Global Step: 114130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:31:08,082-Speed 2995.65 samples/sec   Loss 7.4920   LearningRate 0.0292   Epoch: 9   Global Step: 114140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:31:11,406-Speed 3082.52 samples/sec   Loss 7.3044   LearningRate 0.0292   Epoch: 9   Global Step: 114150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:31:14,849-Speed 2974.99 samples/sec   Loss 7.3484   LearningRate 0.0292   Epoch: 9   Global Step: 114160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:31:18,213-Speed 3045.44 samples/sec   Loss 7.3410   LearningRate 0.0292   Epoch: 9   Global Step: 114170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:31:21,555-Speed 3065.28 samples/sec   Loss 7.4940   LearningRate 0.0292   Epoch: 9   Global Step: 114180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:31:24,959-Speed 3008.96 samples/sec   Loss 7.3464   LearningRate 0.0292   Epoch: 9   Global Step: 114190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:31:28,328-Speed 3040.28 samples/sec   Loss 7.2180   LearningRate 0.0292   Epoch: 9   Global Step: 114200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:31:31,746-Speed 2997.23 samples/sec   Loss 7.3800   LearningRate 0.0292   Epoch: 9   Global Step: 114210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:31:35,132-Speed 3024.32 samples/sec   Loss 7.4033   LearningRate 0.0292   Epoch: 9   Global Step: 114220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:31:38,532-Speed 3013.01 samples/sec   Loss 7.3566   LearningRate 0.0292   Epoch: 9   Global Step: 114230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:31:41,952-Speed 2995.26 samples/sec   Loss 7.4440   LearningRate 0.0292   Epoch: 9   Global Step: 114240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:31:45,352-Speed 3012.17 samples/sec   Loss 7.3293   LearningRate 0.0292   Epoch: 9   Global Step: 114250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:31:48,719-Speed 3041.73 samples/sec   Loss 7.4573   LearningRate 0.0292   Epoch: 9   Global Step: 114260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:31:52,057-Speed 3069.23 samples/sec   Loss 7.3712   LearningRate 0.0292   Epoch: 9   Global Step: 114270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:31:55,386-Speed 3076.36 samples/sec   Loss 7.4564   LearningRate 0.0292   Epoch: 9   Global Step: 114280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:31:58,813-Speed 2989.54 samples/sec   Loss 7.4360   LearningRate 0.0292   Epoch: 9   Global Step: 114290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:32:02,173-Speed 3049.23 samples/sec   Loss 7.3503   LearningRate 0.0291   Epoch: 9   Global Step: 114300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:32:05,571-Speed 3013.51 samples/sec   Loss 7.4036   LearningRate 0.0291   Epoch: 9   Global Step: 114310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:32:08,973-Speed 3011.07 samples/sec   Loss 7.3571   LearningRate 0.0291   Epoch: 9   Global Step: 114320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:32:12,415-Speed 2976.50 samples/sec   Loss 7.4070   LearningRate 0.0291   Epoch: 9   Global Step: 114330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:32:15,781-Speed 3042.42 samples/sec   Loss 7.2874   LearningRate 0.0291   Epoch: 9   Global Step: 114340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:32:19,180-Speed 3014.14 samples/sec   Loss 7.3887   LearningRate 0.0291   Epoch: 9   Global Step: 114350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:32:22,520-Speed 3066.95 samples/sec   Loss 7.4442   LearningRate 0.0291   Epoch: 9   Global Step: 114360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:32:25,913-Speed 3018.53 samples/sec   Loss 7.4158   LearningRate 0.0291   Epoch: 9   Global Step: 114370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:32:29,282-Speed 3040.65 samples/sec   Loss 7.4356   LearningRate 0.0291   Epoch: 9   Global Step: 114380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:32:32,640-Speed 3049.66 samples/sec   Loss 7.4283   LearningRate 0.0291   Epoch: 9   Global Step: 114390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:32:36,144-Speed 2923.49 samples/sec   Loss 7.4034   LearningRate 0.0291   Epoch: 9   Global Step: 114400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:32:39,532-Speed 3023.52 samples/sec   Loss 7.5261   LearningRate 0.0291   Epoch: 9   Global Step: 114410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:32:42,937-Speed 3008.22 samples/sec   Loss 7.4379   LearningRate 0.0291   Epoch: 9   Global Step: 114420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:32:46,355-Speed 2996.63 samples/sec   Loss 7.2643   LearningRate 0.0291   Epoch: 9   Global Step: 114430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:32:49,817-Speed 2958.30 samples/sec   Loss 7.5494   LearningRate 0.0291   Epoch: 9   Global Step: 114440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:32:53,271-Speed 2965.58 samples/sec   Loss 7.4718   LearningRate 0.0291   Epoch: 9   Global Step: 114450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:32:56,678-Speed 3006.68 samples/sec   Loss 7.4253   LearningRate 0.0291   Epoch: 9   Global Step: 114460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:33:00,083-Speed 3008.87 samples/sec   Loss 7.6373   LearningRate 0.0291   Epoch: 9   Global Step: 114470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:33:03,470-Speed 3024.17 samples/sec   Loss 7.4461   LearningRate 0.0291   Epoch: 9   Global Step: 114480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:33:06,903-Speed 2983.38 samples/sec   Loss 7.3708   LearningRate 0.0291   Epoch: 9   Global Step: 114490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:33:10,300-Speed 3015.37 samples/sec   Loss 7.3815   LearningRate 0.0291   Epoch: 9   Global Step: 114500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:33:13,745-Speed 2972.96 samples/sec   Loss 7.4346   LearningRate 0.0291   Epoch: 9   Global Step: 114510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:33:17,192-Speed 2971.79 samples/sec   Loss 7.3959   LearningRate 0.0291   Epoch: 9   Global Step: 114520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:33:20,602-Speed 3003.84 samples/sec   Loss 7.3291   LearningRate 0.0290   Epoch: 9   Global Step: 114530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:33:23,916-Speed 3090.70 samples/sec   Loss 7.5401   LearningRate 0.0290   Epoch: 9   Global Step: 114540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:33:27,342-Speed 2989.88 samples/sec   Loss 7.5886   LearningRate 0.0290   Epoch: 9   Global Step: 114550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:33:31,581-Speed 2416.19 samples/sec   Loss 7.3909   LearningRate 0.0290   Epoch: 9   Global Step: 114560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:33:34,999-Speed 2996.78 samples/sec   Loss 7.6059   LearningRate 0.0290   Epoch: 9   Global Step: 114570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:33:38,440-Speed 2976.40 samples/sec   Loss 7.4732   LearningRate 0.0290   Epoch: 9   Global Step: 114580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:33:41,751-Speed 3093.02 samples/sec   Loss 7.5358   LearningRate 0.0290   Epoch: 9   Global Step: 114590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:33:45,094-Speed 3064.85 samples/sec   Loss 7.5576   LearningRate 0.0290   Epoch: 9   Global Step: 114600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:33:48,516-Speed 2992.87 samples/sec   Loss 7.3029   LearningRate 0.0290   Epoch: 9   Global Step: 114610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:33:51,917-Speed 3012.38 samples/sec   Loss 7.4659   LearningRate 0.0290   Epoch: 9   Global Step: 114620   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:33:55,294-Speed 3032.54 samples/sec   Loss 7.4578   LearningRate 0.0290   Epoch: 9   Global Step: 114630   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:33:58,727-Speed 2984.27 samples/sec   Loss 7.3146   LearningRate 0.0290   Epoch: 9   Global Step: 114640   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:34:02,150-Speed 2992.06 samples/sec   Loss 7.4248   LearningRate 0.0290   Epoch: 9   Global Step: 114650   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:34:05,565-Speed 2999.95 samples/sec   Loss 7.3980   LearningRate 0.0290   Epoch: 9   Global Step: 114660   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:34:09,042-Speed 2945.46 samples/sec   Loss 7.4827   LearningRate 0.0290   Epoch: 9   Global Step: 114670   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:34:12,384-Speed 3064.95 samples/sec   Loss 7.5227   LearningRate 0.0290   Epoch: 9   Global Step: 114680   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:34:15,859-Speed 2947.21 samples/sec   Loss 7.4786   LearningRate 0.0290   Epoch: 9   Global Step: 114690   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:34:19,222-Speed 3046.81 samples/sec   Loss 7.4632   LearningRate 0.0290   Epoch: 9   Global Step: 114700   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:34:22,612-Speed 3021.24 samples/sec   Loss 7.4379   LearningRate 0.0290   Epoch: 9   Global Step: 114710   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 12:34:25,977-Speed 3043.78 samples/sec   Loss 7.5226   LearningRate 0.0290   Epoch: 9   Global Step: 114720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:34:29,432-Speed 2965.06 samples/sec   Loss 7.3126   LearningRate 0.0290   Epoch: 9   Global Step: 114730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:34:32,858-Speed 2989.72 samples/sec   Loss 7.5176   LearningRate 0.0290   Epoch: 9   Global Step: 114740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:34:36,270-Speed 3001.25 samples/sec   Loss 7.5849   LearningRate 0.0290   Epoch: 9   Global Step: 114750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:34:39,680-Speed 3004.84 samples/sec   Loss 7.2877   LearningRate 0.0289   Epoch: 9   Global Step: 114760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:34:43,081-Speed 3013.47 samples/sec   Loss 7.4654   LearningRate 0.0289   Epoch: 9   Global Step: 114770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:34:46,498-Speed 2997.75 samples/sec   Loss 7.5481   LearningRate 0.0289   Epoch: 9   Global Step: 114780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:34:49,855-Speed 3051.15 samples/sec   Loss 7.5586   LearningRate 0.0289   Epoch: 9   Global Step: 114790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:34:53,326-Speed 2951.52 samples/sec   Loss 7.3615   LearningRate 0.0289   Epoch: 9   Global Step: 114800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:34:56,689-Speed 3045.17 samples/sec   Loss 7.4024   LearningRate 0.0289   Epoch: 9   Global Step: 114810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:35:00,051-Speed 3047.61 samples/sec   Loss 7.3972   LearningRate 0.0289   Epoch: 9   Global Step: 114820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:03,445-Speed 3017.82 samples/sec   Loss 7.4529   LearningRate 0.0289   Epoch: 9   Global Step: 114830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:06,877-Speed 2984.26 samples/sec   Loss 7.3257   LearningRate 0.0289   Epoch: 9   Global Step: 114840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:10,349-Speed 2950.92 samples/sec   Loss 7.4385   LearningRate 0.0289   Epoch: 9   Global Step: 114850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:13,802-Speed 2965.64 samples/sec   Loss 7.5391   LearningRate 0.0289   Epoch: 9   Global Step: 114860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:17,211-Speed 3004.80 samples/sec   Loss 7.5361   LearningRate 0.0289   Epoch: 9   Global Step: 114870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:20,588-Speed 3033.37 samples/sec   Loss 7.5166   LearningRate 0.0289   Epoch: 9   Global Step: 114880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:23,942-Speed 3053.53 samples/sec   Loss 7.5493   LearningRate 0.0289   Epoch: 9   Global Step: 114890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:27,308-Speed 3043.30 samples/sec   Loss 7.4521   LearningRate 0.0289   Epoch: 9   Global Step: 114900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:30,667-Speed 3052.01 samples/sec   Loss 7.4100   LearningRate 0.0289   Epoch: 9   Global Step: 114910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:34,067-Speed 3012.47 samples/sec   Loss 7.4143   LearningRate 0.0289   Epoch: 9   Global Step: 114920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:35:37,471-Speed 3008.84 samples/sec   Loss 7.5439   LearningRate 0.0289   Epoch: 9   Global Step: 114930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:40,867-Speed 3016.65 samples/sec   Loss 7.5565   LearningRate 0.0289   Epoch: 9   Global Step: 114940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:44,233-Speed 3042.88 samples/sec   Loss 7.4772   LearningRate 0.0289   Epoch: 9   Global Step: 114950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:47,674-Speed 2976.74 samples/sec   Loss 7.4859   LearningRate 0.0289   Epoch: 9   Global Step: 114960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:51,115-Speed 2976.36 samples/sec   Loss 7.5385   LearningRate 0.0289   Epoch: 9   Global Step: 114970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:54,496-Speed 3030.37 samples/sec   Loss 7.5052   LearningRate 0.0289   Epoch: 9   Global Step: 114980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:35:57,954-Speed 2961.52 samples/sec   Loss 7.4298   LearningRate 0.0288   Epoch: 9   Global Step: 114990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:36:01,450-Speed 2929.90 samples/sec   Loss 7.4202   LearningRate 0.0288   Epoch: 9   Global Step: 115000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:36:04,902-Speed 2967.76 samples/sec   Loss 7.5140   LearningRate 0.0288   Epoch: 9   Global Step: 115010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:36:08,333-Speed 2984.53 samples/sec   Loss 7.6452   LearningRate 0.0288   Epoch: 9   Global Step: 115020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:36:11,728-Speed 3017.36 samples/sec   Loss 7.5374   LearningRate 0.0288   Epoch: 9   Global Step: 115030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:36:15,055-Speed 3078.89 samples/sec   Loss 7.4632   LearningRate 0.0288   Epoch: 9   Global Step: 115040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:36:18,468-Speed 3000.96 samples/sec   Loss 7.5390   LearningRate 0.0288   Epoch: 9   Global Step: 115050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:36:21,885-Speed 2998.05 samples/sec   Loss 7.5682   LearningRate 0.0288   Epoch: 9   Global Step: 115060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:36:25,333-Speed 2970.67 samples/sec   Loss 7.5788   LearningRate 0.0288   Epoch: 9   Global Step: 115070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:36:28,767-Speed 2982.22 samples/sec   Loss 7.6619   LearningRate 0.0288   Epoch: 9   Global Step: 115080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:36:32,192-Speed 2991.45 samples/sec   Loss 7.5350   LearningRate 0.0288   Epoch: 9   Global Step: 115090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:36:35,588-Speed 3016.00 samples/sec   Loss 7.5774   LearningRate 0.0288   Epoch: 9   Global Step: 115100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:36:39,005-Speed 2996.87 samples/sec   Loss 7.3629   LearningRate 0.0288   Epoch: 9   Global Step: 115110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:36:42,424-Speed 2996.31 samples/sec   Loss 7.5341   LearningRate 0.0288   Epoch: 9   Global Step: 115120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:36:45,758-Speed 3072.08 samples/sec   Loss 7.5562   LearningRate 0.0288   Epoch: 9   Global Step: 115130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:36:49,246-Speed 2936.98 samples/sec   Loss 7.5071   LearningRate 0.0288   Epoch: 9   Global Step: 115140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:36:52,607-Speed 3047.38 samples/sec   Loss 7.4293   LearningRate 0.0288   Epoch: 9   Global Step: 115150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:36:56,025-Speed 2996.98 samples/sec   Loss 7.5273   LearningRate 0.0288   Epoch: 9   Global Step: 115160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:36:59,449-Speed 2991.58 samples/sec   Loss 7.5251   LearningRate 0.0288   Epoch: 9   Global Step: 115170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:37:02,880-Speed 2985.10 samples/sec   Loss 7.4382   LearningRate 0.0288   Epoch: 9   Global Step: 115180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:37:06,255-Speed 3034.65 samples/sec   Loss 7.5936   LearningRate 0.0288   Epoch: 9   Global Step: 115190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:37:09,635-Speed 3030.57 samples/sec   Loss 7.4587   LearningRate 0.0288   Epoch: 9   Global Step: 115200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:37:13,041-Speed 3007.33 samples/sec   Loss 7.4953   LearningRate 0.0288   Epoch: 9   Global Step: 115210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:37:16,478-Speed 2980.45 samples/sec   Loss 7.5511   LearningRate 0.0287   Epoch: 9   Global Step: 115220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:37:19,910-Speed 2984.54 samples/sec   Loss 7.4222   LearningRate 0.0287   Epoch: 9   Global Step: 115230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:37:23,365-Speed 2964.69 samples/sec   Loss 7.6536   LearningRate 0.0287   Epoch: 9   Global Step: 115240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:37:26,827-Speed 2958.65 samples/sec   Loss 7.5147   LearningRate 0.0287   Epoch: 9   Global Step: 115250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:37:30,253-Speed 2990.32 samples/sec   Loss 7.5128   LearningRate 0.0287   Epoch: 9   Global Step: 115260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:37:33,664-Speed 3002.56 samples/sec   Loss 7.5413   LearningRate 0.0287   Epoch: 9   Global Step: 115270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:37:37,020-Speed 3052.32 samples/sec   Loss 7.5670   LearningRate 0.0287   Epoch: 9   Global Step: 115280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:37:40,409-Speed 3021.88 samples/sec   Loss 7.6118   LearningRate 0.0287   Epoch: 9   Global Step: 115290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:37:43,795-Speed 3025.05 samples/sec   Loss 7.6085   LearningRate 0.0287   Epoch: 9   Global Step: 115300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:37:47,208-Speed 3001.17 samples/sec   Loss 7.5665   LearningRate 0.0287   Epoch: 9   Global Step: 115310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:37:50,612-Speed 3009.34 samples/sec   Loss 7.5703   LearningRate 0.0287   Epoch: 9   Global Step: 115320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:37:54,048-Speed 2981.33 samples/sec   Loss 7.5029   LearningRate 0.0287   Epoch: 9   Global Step: 115330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:37:57,396-Speed 3059.26 samples/sec   Loss 7.4526   LearningRate 0.0287   Epoch: 9   Global Step: 115340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:38:00,760-Speed 3045.22 samples/sec   Loss 7.5613   LearningRate 0.0287   Epoch: 9   Global Step: 115350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:38:04,219-Speed 2960.55 samples/sec   Loss 7.6688   LearningRate 0.0287   Epoch: 9   Global Step: 115360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:38:07,635-Speed 2998.23 samples/sec   Loss 7.6232   LearningRate 0.0287   Epoch: 9   Global Step: 115370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:38:11,153-Speed 2911.63 samples/sec   Loss 7.4790   LearningRate 0.0287   Epoch: 9   Global Step: 115380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:38:14,520-Speed 3042.38 samples/sec   Loss 7.6894   LearningRate 0.0287   Epoch: 9   Global Step: 115390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:38:17,935-Speed 2999.69 samples/sec   Loss 7.5081   LearningRate 0.0287   Epoch: 9   Global Step: 115400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 12:38:21,278-Speed 3063.46 samples/sec   Loss 7.4783   LearningRate 0.0287   Epoch: 9   Global Step: 115410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:38:24,633-Speed 3053.49 samples/sec   Loss 7.6634   LearningRate 0.0287   Epoch: 9   Global Step: 115420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:38:28,031-Speed 3013.84 samples/sec   Loss 7.4778   LearningRate 0.0287   Epoch: 9   Global Step: 115430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:38:31,452-Speed 2994.03 samples/sec   Loss 7.5237   LearningRate 0.0287   Epoch: 9   Global Step: 115440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:38:34,764-Speed 3093.66 samples/sec   Loss 7.4993   LearningRate 0.0287   Epoch: 9   Global Step: 115450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:38:38,082-Speed 3086.69 samples/sec   Loss 7.5522   LearningRate 0.0286   Epoch: 9   Global Step: 115460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:38:41,523-Speed 2976.51 samples/sec   Loss 7.5156   LearningRate 0.0286   Epoch: 9   Global Step: 115470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:38:45,020-Speed 2929.21 samples/sec   Loss 7.4734   LearningRate 0.0286   Epoch: 9   Global Step: 115480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:38:48,396-Speed 3034.22 samples/sec   Loss 7.6490   LearningRate 0.0286   Epoch: 9   Global Step: 115490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:38:51,757-Speed 3048.18 samples/sec   Loss 7.6022   LearningRate 0.0286   Epoch: 9   Global Step: 115500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:38:55,205-Speed 2969.83 samples/sec   Loss 7.5261   LearningRate 0.0286   Epoch: 9   Global Step: 115510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:38:58,687-Speed 2942.70 samples/sec   Loss 7.5002   LearningRate 0.0286   Epoch: 9   Global Step: 115520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:39:02,084-Speed 3014.90 samples/sec   Loss 7.6495   LearningRate 0.0286   Epoch: 9   Global Step: 115530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:39:05,508-Speed 2991.71 samples/sec   Loss 7.5719   LearningRate 0.0286   Epoch: 9   Global Step: 115540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:39:08,855-Speed 3060.68 samples/sec   Loss 7.6938   LearningRate 0.0286   Epoch: 9   Global Step: 115550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:39:12,172-Speed 3087.75 samples/sec   Loss 7.4765   LearningRate 0.0286   Epoch: 9   Global Step: 115560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:39:15,529-Speed 3051.12 samples/sec   Loss 7.6321   LearningRate 0.0286   Epoch: 9   Global Step: 115570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:39:18,896-Speed 3042.36 samples/sec   Loss 7.5952   LearningRate 0.0286   Epoch: 9   Global Step: 115580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:39:22,305-Speed 3004.79 samples/sec   Loss 7.4861   LearningRate 0.0286   Epoch: 9   Global Step: 115590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:39:25,745-Speed 2977.18 samples/sec   Loss 7.7208   LearningRate 0.0286   Epoch: 9   Global Step: 115600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:39:29,106-Speed 3047.69 samples/sec   Loss 7.5618   LearningRate 0.0286   Epoch: 9   Global Step: 115610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:39:32,556-Speed 2970.10 samples/sec   Loss 7.4910   LearningRate 0.0286   Epoch: 9   Global Step: 115620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:39:35,957-Speed 3011.19 samples/sec   Loss 7.5717   LearningRate 0.0286   Epoch: 9   Global Step: 115630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:39:39,375-Speed 2996.78 samples/sec   Loss 7.6996   LearningRate 0.0286   Epoch: 9   Global Step: 115640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:39:42,789-Speed 3000.23 samples/sec   Loss 7.5627   LearningRate 0.0286   Epoch: 9   Global Step: 115650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:39:46,180-Speed 3020.39 samples/sec   Loss 7.6010   LearningRate 0.0286   Epoch: 9   Global Step: 115660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:39:49,688-Speed 2920.32 samples/sec   Loss 7.6363   LearningRate 0.0286   Epoch: 9   Global Step: 115670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 12:39:53,136-Speed 2971.05 samples/sec   Loss 7.6198   LearningRate 0.0286   Epoch: 9   Global Step: 115680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:39:56,583-Speed 2971.00 samples/sec   Loss 7.5398   LearningRate 0.0285   Epoch: 9   Global Step: 115690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:39:59,997-Speed 3000.46 samples/sec   Loss 7.4595   LearningRate 0.0285   Epoch: 9   Global Step: 115700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:40:03,388-Speed 3020.90 samples/sec   Loss 7.6483   LearningRate 0.0285   Epoch: 9   Global Step: 115710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:40:06,931-Speed 2890.87 samples/sec   Loss 7.6759   LearningRate 0.0285   Epoch: 9   Global Step: 115720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:40:10,250-Speed 3087.02 samples/sec   Loss 7.5388   LearningRate 0.0285   Epoch: 9   Global Step: 115730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 12:40:13,607-Speed 3050.87 samples/sec   Loss 7.7666   LearningRate 0.0285   Epoch: 9   Global Step: 115740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:40:17,023-Speed 2997.95 samples/sec   Loss 7.5569   LearningRate 0.0285   Epoch: 9   Global Step: 115750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:40:20,504-Speed 2942.29 samples/sec   Loss 7.7426   LearningRate 0.0285   Epoch: 9   Global Step: 115760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:40:23,852-Speed 3060.01 samples/sec   Loss 7.5081   LearningRate 0.0285   Epoch: 9   Global Step: 115770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:40:27,201-Speed 3058.13 samples/sec   Loss 7.5919   LearningRate 0.0285   Epoch: 9   Global Step: 115780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:40:30,553-Speed 3055.69 samples/sec   Loss 7.6290   LearningRate 0.0285   Epoch: 9   Global Step: 115790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:40:33,967-Speed 3000.21 samples/sec   Loss 7.4776   LearningRate 0.0285   Epoch: 9   Global Step: 115800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:40:37,373-Speed 3007.62 samples/sec   Loss 7.6503   LearningRate 0.0285   Epoch: 9   Global Step: 115810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:40:40,690-Speed 3088.09 samples/sec   Loss 7.4770   LearningRate 0.0285   Epoch: 9   Global Step: 115820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:40:44,026-Speed 3070.19 samples/sec   Loss 7.6820   LearningRate 0.0285   Epoch: 9   Global Step: 115830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:40:47,417-Speed 3020.78 samples/sec   Loss 7.5064   LearningRate 0.0285   Epoch: 9   Global Step: 115840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:40:50,843-Speed 2989.56 samples/sec   Loss 7.5031   LearningRate 0.0285   Epoch: 9   Global Step: 115850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:40:54,156-Speed 3091.80 samples/sec   Loss 7.4759   LearningRate 0.0285   Epoch: 9   Global Step: 115860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:40:57,490-Speed 3072.71 samples/sec   Loss 7.5422   LearningRate 0.0285   Epoch: 9   Global Step: 115870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:41:00,842-Speed 3056.02 samples/sec   Loss 7.7645   LearningRate 0.0285   Epoch: 9   Global Step: 115880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:41:04,145-Speed 3101.07 samples/sec   Loss 7.6008   LearningRate 0.0285   Epoch: 9   Global Step: 115890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:41:07,496-Speed 3056.12 samples/sec   Loss 7.4730   LearningRate 0.0285   Epoch: 9   Global Step: 115900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:41:10,815-Speed 3086.80 samples/sec   Loss 7.5495   LearningRate 0.0285   Epoch: 9   Global Step: 115910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:41:14,133-Speed 3086.80 samples/sec   Loss 7.7359   LearningRate 0.0284   Epoch: 9   Global Step: 115920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:41:17,475-Speed 3064.99 samples/sec   Loss 7.6152   LearningRate 0.0284   Epoch: 9   Global Step: 115930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:41:20,834-Speed 3049.58 samples/sec   Loss 7.5375   LearningRate 0.0284   Epoch: 9   Global Step: 115940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:41:24,167-Speed 3073.42 samples/sec   Loss 7.5601   LearningRate 0.0284   Epoch: 9   Global Step: 115950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:41:27,536-Speed 3040.37 samples/sec   Loss 7.5072   LearningRate 0.0284   Epoch: 9   Global Step: 115960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:41:30,940-Speed 3008.52 samples/sec   Loss 7.7302   LearningRate 0.0284   Epoch: 9   Global Step: 115970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:41:34,257-Speed 3088.74 samples/sec   Loss 7.6426   LearningRate 0.0284   Epoch: 9   Global Step: 115980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:41:37,582-Speed 3080.30 samples/sec   Loss 7.5630   LearningRate 0.0284   Epoch: 9   Global Step: 115990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:41:40,940-Speed 3050.24 samples/sec   Loss 7.5837   LearningRate 0.0284   Epoch: 9   Global Step: 116000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:41:44,277-Speed 3069.79 samples/sec   Loss 7.5065   LearningRate 0.0284   Epoch: 9   Global Step: 116010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:41:47,678-Speed 3012.03 samples/sec   Loss 7.6793   LearningRate 0.0284   Epoch: 9   Global Step: 116020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:41:51,029-Speed 3056.59 samples/sec   Loss 7.6261   LearningRate 0.0284   Epoch: 9   Global Step: 116030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:41:54,448-Speed 2995.44 samples/sec   Loss 7.5445   LearningRate 0.0284   Epoch: 9   Global Step: 116040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:41:57,796-Speed 3059.49 samples/sec   Loss 7.5604   LearningRate 0.0284   Epoch: 9   Global Step: 116050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:01,218-Speed 2993.58 samples/sec   Loss 7.6607   LearningRate 0.0284   Epoch: 9   Global Step: 116060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:04,545-Speed 3078.04 samples/sec   Loss 7.5325   LearningRate 0.0284   Epoch: 9   Global Step: 116070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:07,892-Speed 3059.92 samples/sec   Loss 7.6212   LearningRate 0.0284   Epoch: 9   Global Step: 116080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:11,227-Speed 3072.26 samples/sec   Loss 7.5436   LearningRate 0.0284   Epoch: 9   Global Step: 116090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:42:14,601-Speed 3035.34 samples/sec   Loss 7.5809   LearningRate 0.0284   Epoch: 9   Global Step: 116100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:18,042-Speed 2976.66 samples/sec   Loss 7.7324   LearningRate 0.0284   Epoch: 9   Global Step: 116110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:21,424-Speed 3028.42 samples/sec   Loss 7.5974   LearningRate 0.0284   Epoch: 9   Global Step: 116120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:24,822-Speed 3014.40 samples/sec   Loss 7.6352   LearningRate 0.0284   Epoch: 9   Global Step: 116130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:28,254-Speed 2985.29 samples/sec   Loss 7.6153   LearningRate 0.0284   Epoch: 9   Global Step: 116140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:31,621-Speed 3042.47 samples/sec   Loss 7.6431   LearningRate 0.0283   Epoch: 9   Global Step: 116150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:35,032-Speed 3002.97 samples/sec   Loss 7.5924   LearningRate 0.0283   Epoch: 9   Global Step: 116160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:38,445-Speed 3000.24 samples/sec   Loss 7.7274   LearningRate 0.0283   Epoch: 9   Global Step: 116170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:41,821-Speed 3034.45 samples/sec   Loss 7.4289   LearningRate 0.0283   Epoch: 9   Global Step: 116180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:45,277-Speed 2963.89 samples/sec   Loss 7.5524   LearningRate 0.0283   Epoch: 9   Global Step: 116190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:48,689-Speed 3001.93 samples/sec   Loss 7.4648   LearningRate 0.0283   Epoch: 9   Global Step: 116200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:42:52,120-Speed 2985.57 samples/sec   Loss 7.7213   LearningRate 0.0283   Epoch: 9   Global Step: 116210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:55,601-Speed 2942.22 samples/sec   Loss 7.4726   LearningRate 0.0283   Epoch: 9   Global Step: 116220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:42:59,058-Speed 2963.39 samples/sec   Loss 7.6939   LearningRate 0.0283   Epoch: 9   Global Step: 116230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:02,443-Speed 3025.96 samples/sec   Loss 7.6276   LearningRate 0.0283   Epoch: 9   Global Step: 116240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:05,945-Speed 2925.17 samples/sec   Loss 7.6060   LearningRate 0.0283   Epoch: 9   Global Step: 116250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:09,358-Speed 3000.71 samples/sec   Loss 7.6511   LearningRate 0.0283   Epoch: 9   Global Step: 116260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:12,723-Speed 3043.64 samples/sec   Loss 7.5036   LearningRate 0.0283   Epoch: 9   Global Step: 116270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:16,162-Speed 2979.02 samples/sec   Loss 7.4990   LearningRate 0.0283   Epoch: 9   Global Step: 116280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:19,586-Speed 2991.75 samples/sec   Loss 7.6400   LearningRate 0.0283   Epoch: 9   Global Step: 116290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:22,914-Speed 3076.90 samples/sec   Loss 7.6830   LearningRate 0.0283   Epoch: 9   Global Step: 116300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:26,342-Speed 2988.34 samples/sec   Loss 7.4814   LearningRate 0.0283   Epoch: 9   Global Step: 116310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:29,790-Speed 2971.00 samples/sec   Loss 7.5914   LearningRate 0.0283   Epoch: 9   Global Step: 116320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:33,285-Speed 2930.53 samples/sec   Loss 7.6338   LearningRate 0.0283   Epoch: 9   Global Step: 116330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:36,805-Speed 2909.80 samples/sec   Loss 7.6007   LearningRate 0.0283   Epoch: 9   Global Step: 116340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:40,157-Speed 3056.06 samples/sec   Loss 7.6368   LearningRate 0.0283   Epoch: 9   Global Step: 116350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:43,515-Speed 3049.95 samples/sec   Loss 7.6782   LearningRate 0.0283   Epoch: 9   Global Step: 116360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:46,884-Speed 3040.33 samples/sec   Loss 7.6299   LearningRate 0.0283   Epoch: 9   Global Step: 116370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:50,261-Speed 3033.46 samples/sec   Loss 7.6061   LearningRate 0.0283   Epoch: 9   Global Step: 116380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:53,689-Speed 2988.59 samples/sec   Loss 7.6527   LearningRate 0.0282   Epoch: 9   Global Step: 116390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:43:57,160-Speed 2951.02 samples/sec   Loss 7.5860   LearningRate 0.0282   Epoch: 9   Global Step: 116400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:44:00,573-Speed 3001.06 samples/sec   Loss 7.5434   LearningRate 0.0282   Epoch: 9   Global Step: 116410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:44:03,929-Speed 3051.83 samples/sec   Loss 7.6841   LearningRate 0.0282   Epoch: 9   Global Step: 116420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:44:07,409-Speed 2944.01 samples/sec   Loss 7.7070   LearningRate 0.0282   Epoch: 9   Global Step: 116430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:44:10,909-Speed 2926.24 samples/sec   Loss 7.5625   LearningRate 0.0282   Epoch: 9   Global Step: 116440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:44:14,282-Speed 3036.21 samples/sec   Loss 7.5819   LearningRate 0.0282   Epoch: 9   Global Step: 116450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:44:17,738-Speed 2964.28 samples/sec   Loss 7.7852   LearningRate 0.0282   Epoch: 9   Global Step: 116460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:44:21,102-Speed 3045.21 samples/sec   Loss 7.6605   LearningRate 0.0282   Epoch: 9   Global Step: 116470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:44:24,502-Speed 3012.56 samples/sec   Loss 7.7795   LearningRate 0.0282   Epoch: 9   Global Step: 116480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:44:27,911-Speed 3004.85 samples/sec   Loss 7.6994   LearningRate 0.0282   Epoch: 9   Global Step: 116490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:44:31,289-Speed 3031.70 samples/sec   Loss 7.5809   LearningRate 0.0282   Epoch: 9   Global Step: 116500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:44:34,676-Speed 3023.87 samples/sec   Loss 7.5692   LearningRate 0.0282   Epoch: 9   Global Step: 116510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:44:38,103-Speed 2989.00 samples/sec   Loss 7.6302   LearningRate 0.0282   Epoch: 9   Global Step: 116520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:44:41,498-Speed 3017.41 samples/sec   Loss 7.5163   LearningRate 0.0282   Epoch: 9   Global Step: 116530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:44:44,895-Speed 3015.48 samples/sec   Loss 7.4592   LearningRate 0.0282   Epoch: 9   Global Step: 116540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:44:48,380-Speed 2938.83 samples/sec   Loss 7.6281   LearningRate 0.0282   Epoch: 9   Global Step: 116550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:44:51,857-Speed 2946.20 samples/sec   Loss 7.6072   LearningRate 0.0282   Epoch: 9   Global Step: 116560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:44:55,243-Speed 3025.15 samples/sec   Loss 7.6185   LearningRate 0.0282   Epoch: 9   Global Step: 116570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:44:58,625-Speed 3028.87 samples/sec   Loss 7.6542   LearningRate 0.0282   Epoch: 9   Global Step: 116580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:02,065-Speed 2977.45 samples/sec   Loss 7.6900   LearningRate 0.0282   Epoch: 9   Global Step: 116590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:05,474-Speed 3004.96 samples/sec   Loss 7.6023   LearningRate 0.0282   Epoch: 9   Global Step: 116600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:08,931-Speed 2962.61 samples/sec   Loss 7.6077   LearningRate 0.0282   Epoch: 9   Global Step: 116610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:12,405-Speed 2948.70 samples/sec   Loss 7.6234   LearningRate 0.0281   Epoch: 9   Global Step: 116620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:45:15,730-Speed 3080.74 samples/sec   Loss 7.6313   LearningRate 0.0281   Epoch: 9   Global Step: 116630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:45:19,067-Speed 3069.62 samples/sec   Loss 7.5599   LearningRate 0.0281   Epoch: 9   Global Step: 116640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:22,483-Speed 2998.13 samples/sec   Loss 7.6723   LearningRate 0.0281   Epoch: 9   Global Step: 116650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:25,814-Speed 3075.22 samples/sec   Loss 7.5754   LearningRate 0.0281   Epoch: 9   Global Step: 116660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:29,204-Speed 3021.26 samples/sec   Loss 7.7361   LearningRate 0.0281   Epoch: 9   Global Step: 116670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:32,566-Speed 3047.16 samples/sec   Loss 7.6971   LearningRate 0.0281   Epoch: 9   Global Step: 116680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:35,935-Speed 3040.39 samples/sec   Loss 7.6643   LearningRate 0.0281   Epoch: 9   Global Step: 116690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:39,309-Speed 3035.90 samples/sec   Loss 7.6445   LearningRate 0.0281   Epoch: 9   Global Step: 116700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:42,735-Speed 2989.07 samples/sec   Loss 7.6354   LearningRate 0.0281   Epoch: 9   Global Step: 116710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:46,096-Speed 3048.28 samples/sec   Loss 7.7324   LearningRate 0.0281   Epoch: 9   Global Step: 116720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:49,510-Speed 2999.89 samples/sec   Loss 7.6738   LearningRate 0.0281   Epoch: 9   Global Step: 116730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:52,940-Speed 2986.59 samples/sec   Loss 7.8015   LearningRate 0.0281   Epoch: 9   Global Step: 116740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:45:56,352-Speed 3001.85 samples/sec   Loss 7.6082   LearningRate 0.0281   Epoch: 9   Global Step: 116750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:45:59,767-Speed 2999.32 samples/sec   Loss 7.6603   LearningRate 0.0281   Epoch: 9   Global Step: 116760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:46:03,174-Speed 3006.55 samples/sec   Loss 7.8193   LearningRate 0.0281   Epoch: 9   Global Step: 116770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:46:06,559-Speed 3025.59 samples/sec   Loss 7.5963   LearningRate 0.0281   Epoch: 9   Global Step: 116780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:46:10,039-Speed 2943.84 samples/sec   Loss 7.5668   LearningRate 0.0281   Epoch: 9   Global Step: 116790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:46:13,381-Speed 3065.06 samples/sec   Loss 7.6257   LearningRate 0.0281   Epoch: 9   Global Step: 116800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:46:16,791-Speed 3003.43 samples/sec   Loss 7.6076   LearningRate 0.0281   Epoch: 9   Global Step: 116810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:46:20,184-Speed 3018.81 samples/sec   Loss 7.5784   LearningRate 0.0281   Epoch: 9   Global Step: 116820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:46:23,531-Speed 3060.82 samples/sec   Loss 7.6157   LearningRate 0.0281   Epoch: 9   Global Step: 116830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:46:26,860-Speed 3076.20 samples/sec   Loss 7.7553   LearningRate 0.0281   Epoch: 9   Global Step: 116840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:46:30,299-Speed 2978.85 samples/sec   Loss 7.5529   LearningRate 0.0281   Epoch: 9   Global Step: 116850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:46:33,675-Speed 3034.33 samples/sec   Loss 7.7064   LearningRate 0.0280   Epoch: 9   Global Step: 116860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:46:37,086-Speed 3002.19 samples/sec   Loss 7.7132   LearningRate 0.0280   Epoch: 9   Global Step: 116870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:46:40,471-Speed 3026.82 samples/sec   Loss 7.6473   LearningRate 0.0280   Epoch: 9   Global Step: 116880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:46:43,897-Speed 2989.45 samples/sec   Loss 7.4982   LearningRate 0.0280   Epoch: 9   Global Step: 116890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:46:47,291-Speed 3018.13 samples/sec   Loss 7.5758   LearningRate 0.0280   Epoch: 9   Global Step: 116900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:46:50,692-Speed 3011.24 samples/sec   Loss 7.5703   LearningRate 0.0280   Epoch: 9   Global Step: 116910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:46:54,036-Speed 3063.33 samples/sec   Loss 7.6517   LearningRate 0.0280   Epoch: 9   Global Step: 116920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:46:57,452-Speed 2999.55 samples/sec   Loss 7.5050   LearningRate 0.0280   Epoch: 9   Global Step: 116930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:00,905-Speed 2966.14 samples/sec   Loss 7.7223   LearningRate 0.0280   Epoch: 9   Global Step: 116940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:04,271-Speed 3042.99 samples/sec   Loss 7.5266   LearningRate 0.0280   Epoch: 9   Global Step: 116950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:47:07,699-Speed 2987.59 samples/sec   Loss 7.5510   LearningRate 0.0280   Epoch: 9   Global Step: 116960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:11,039-Speed 3067.28 samples/sec   Loss 7.6063   LearningRate 0.0280   Epoch: 9   Global Step: 116970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:14,435-Speed 3015.73 samples/sec   Loss 7.5880   LearningRate 0.0280   Epoch: 9   Global Step: 116980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:17,838-Speed 3010.85 samples/sec   Loss 7.5189   LearningRate 0.0280   Epoch: 9   Global Step: 116990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:21,184-Speed 3060.62 samples/sec   Loss 7.5856   LearningRate 0.0280   Epoch: 9   Global Step: 117000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:24,529-Speed 3062.37 samples/sec   Loss 7.5810   LearningRate 0.0280   Epoch: 9   Global Step: 117010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:27,948-Speed 2995.77 samples/sec   Loss 7.5489   LearningRate 0.0280   Epoch: 9   Global Step: 117020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:31,328-Speed 3030.27 samples/sec   Loss 7.7158   LearningRate 0.0280   Epoch: 9   Global Step: 117030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:34,753-Speed 2990.48 samples/sec   Loss 7.6682   LearningRate 0.0280   Epoch: 9   Global Step: 117040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:38,150-Speed 3015.70 samples/sec   Loss 7.6234   LearningRate 0.0280   Epoch: 9   Global Step: 117050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:41,553-Speed 3010.49 samples/sec   Loss 7.6239   LearningRate 0.0280   Epoch: 9   Global Step: 117060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:47:44,917-Speed 3044.64 samples/sec   Loss 7.6625   LearningRate 0.0280   Epoch: 9   Global Step: 117070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:48,425-Speed 2919.80 samples/sec   Loss 7.5511   LearningRate 0.0280   Epoch: 9   Global Step: 117080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:51,775-Speed 3056.96 samples/sec   Loss 7.6183   LearningRate 0.0279   Epoch: 9   Global Step: 117090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:55,124-Speed 3059.42 samples/sec   Loss 7.6547   LearningRate 0.0279   Epoch: 9   Global Step: 117100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:47:58,477-Speed 3054.03 samples/sec   Loss 7.5650   LearningRate 0.0279   Epoch: 9   Global Step: 117110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:01,931-Speed 2965.31 samples/sec   Loss 7.6946   LearningRate 0.0279   Epoch: 9   Global Step: 117120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:05,297-Speed 3043.74 samples/sec   Loss 7.6623   LearningRate 0.0279   Epoch: 9   Global Step: 117130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:08,706-Speed 3004.68 samples/sec   Loss 7.7446   LearningRate 0.0279   Epoch: 9   Global Step: 117140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:12,168-Speed 2957.69 samples/sec   Loss 7.5971   LearningRate 0.0279   Epoch: 9   Global Step: 117150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:15,598-Speed 2986.77 samples/sec   Loss 7.6868   LearningRate 0.0279   Epoch: 9   Global Step: 117160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:19,031-Speed 2983.39 samples/sec   Loss 7.7148   LearningRate 0.0279   Epoch: 9   Global Step: 117170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:48:22,384-Speed 3055.24 samples/sec   Loss 7.6418   LearningRate 0.0279   Epoch: 9   Global Step: 117180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:25,725-Speed 3066.35 samples/sec   Loss 7.7854   LearningRate 0.0279   Epoch: 9   Global Step: 117190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:29,143-Speed 2996.67 samples/sec   Loss 7.6134   LearningRate 0.0279   Epoch: 9   Global Step: 117200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:32,575-Speed 2984.71 samples/sec   Loss 7.5048   LearningRate 0.0279   Epoch: 9   Global Step: 117210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:35,997-Speed 2993.21 samples/sec   Loss 7.5166   LearningRate 0.0279   Epoch: 9   Global Step: 117220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:39,343-Speed 3061.27 samples/sec   Loss 7.8287   LearningRate 0.0279   Epoch: 9   Global Step: 117230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:42,648-Speed 3099.48 samples/sec   Loss 7.6498   LearningRate 0.0279   Epoch: 9   Global Step: 117240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:46,049-Speed 3011.60 samples/sec   Loss 7.4910   LearningRate 0.0279   Epoch: 9   Global Step: 117250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:49,462-Speed 3000.62 samples/sec   Loss 7.6748   LearningRate 0.0279   Epoch: 9   Global Step: 117260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:52,866-Speed 3009.38 samples/sec   Loss 7.6625   LearningRate 0.0279   Epoch: 9   Global Step: 117270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:48:56,236-Speed 3038.78 samples/sec   Loss 7.6463   LearningRate 0.0279   Epoch: 9   Global Step: 117280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:48:59,546-Speed 3094.97 samples/sec   Loss 7.7634   LearningRate 0.0279   Epoch: 9   Global Step: 117290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:49:02,922-Speed 3034.41 samples/sec   Loss 7.6464   LearningRate 0.0279   Epoch: 9   Global Step: 117300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:49:06,254-Speed 3075.24 samples/sec   Loss 7.6434   LearningRate 0.0279   Epoch: 9   Global Step: 117310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:49:09,649-Speed 3016.69 samples/sec   Loss 7.5391   LearningRate 0.0279   Epoch: 9   Global Step: 117320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:49:13,064-Speed 2999.30 samples/sec   Loss 7.6903   LearningRate 0.0278   Epoch: 9   Global Step: 117330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:49:16,493-Speed 2986.73 samples/sec   Loss 7.7545   LearningRate 0.0278   Epoch: 9   Global Step: 117340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:49:19,885-Speed 3020.20 samples/sec   Loss 7.7480   LearningRate 0.0278   Epoch: 9   Global Step: 117350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:49:23,194-Speed 3096.10 samples/sec   Loss 7.6862   LearningRate 0.0278   Epoch: 9   Global Step: 117360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:49:26,527-Speed 3072.78 samples/sec   Loss 7.7737   LearningRate 0.0278   Epoch: 9   Global Step: 117370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:49:29,863-Speed 3070.05 samples/sec   Loss 7.5916   LearningRate 0.0278   Epoch: 9   Global Step: 117380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:49:33,256-Speed 3019.26 samples/sec   Loss 7.6677   LearningRate 0.0278   Epoch: 9   Global Step: 117390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:49:36,684-Speed 2987.63 samples/sec   Loss 7.5109   LearningRate 0.0278   Epoch: 9   Global Step: 117400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:49:40,059-Speed 3034.92 samples/sec   Loss 7.6554   LearningRate 0.0278   Epoch: 9   Global Step: 117410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:49:43,451-Speed 3020.47 samples/sec   Loss 7.5748   LearningRate 0.0278   Epoch: 9   Global Step: 117420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:49:46,815-Speed 3044.58 samples/sec   Loss 7.6320   LearningRate 0.0278   Epoch: 9   Global Step: 117430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:49:50,187-Speed 3037.46 samples/sec   Loss 7.6227   LearningRate 0.0278   Epoch: 9   Global Step: 117440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:49:53,613-Speed 2989.22 samples/sec   Loss 7.6266   LearningRate 0.0278   Epoch: 9   Global Step: 117450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:49:57,047-Speed 2982.85 samples/sec   Loss 7.7205   LearningRate 0.0278   Epoch: 9   Global Step: 117460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:50:00,540-Speed 2939.46 samples/sec   Loss 7.6887   LearningRate 0.0278   Epoch: 9   Global Step: 117470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:50:04,048-Speed 2919.83 samples/sec   Loss 7.5951   LearningRate 0.0278   Epoch: 9   Global Step: 117480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:50:07,406-Speed 3050.41 samples/sec   Loss 7.6688   LearningRate 0.0278   Epoch: 9   Global Step: 117490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:50:10,834-Speed 2988.23 samples/sec   Loss 7.7067   LearningRate 0.0278   Epoch: 9   Global Step: 117500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:50:14,231-Speed 3015.01 samples/sec   Loss 7.6183   LearningRate 0.0278   Epoch: 9   Global Step: 117510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:50:17,606-Speed 3034.71 samples/sec   Loss 7.6541   LearningRate 0.0278   Epoch: 9   Global Step: 117520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:50:21,031-Speed 2990.86 samples/sec   Loss 7.6181   LearningRate 0.0278   Epoch: 9   Global Step: 117530   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:50:24,393-Speed 3048.09 samples/sec   Loss 7.5650   LearningRate 0.0278   Epoch: 9   Global Step: 117540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:50:27,900-Speed 2920.17 samples/sec   Loss 7.6351   LearningRate 0.0278   Epoch: 9   Global Step: 117550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:50:31,263-Speed 3046.58 samples/sec   Loss 7.6610   LearningRate 0.0277   Epoch: 9   Global Step: 117560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:50:34,604-Speed 3065.09 samples/sec   Loss 7.4679   LearningRate 0.0277   Epoch: 9   Global Step: 117570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:50:38,064-Speed 2960.79 samples/sec   Loss 7.6599   LearningRate 0.0277   Epoch: 9   Global Step: 117580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:50:41,478-Speed 3000.22 samples/sec   Loss 7.6914   LearningRate 0.0277   Epoch: 9   Global Step: 117590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:50:44,943-Speed 2956.24 samples/sec   Loss 7.5083   LearningRate 0.0277   Epoch: 9   Global Step: 117600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:50:48,336-Speed 3019.11 samples/sec   Loss 7.5216   LearningRate 0.0277   Epoch: 9   Global Step: 117610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:50:51,771-Speed 2982.09 samples/sec   Loss 7.7115   LearningRate 0.0277   Epoch: 9   Global Step: 117620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:50:55,109-Speed 3068.49 samples/sec   Loss 7.5492   LearningRate 0.0277   Epoch: 9   Global Step: 117630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:50:58,488-Speed 3031.38 samples/sec   Loss 7.5882   LearningRate 0.0277   Epoch: 9   Global Step: 117640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:01,880-Speed 3019.29 samples/sec   Loss 7.6327   LearningRate 0.0277   Epoch: 9   Global Step: 117650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:05,226-Speed 3061.26 samples/sec   Loss 7.5475   LearningRate 0.0277   Epoch: 9   Global Step: 117660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:08,596-Speed 3039.83 samples/sec   Loss 7.6398   LearningRate 0.0277   Epoch: 9   Global Step: 117670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:11,999-Speed 3010.10 samples/sec   Loss 7.5838   LearningRate 0.0277   Epoch: 9   Global Step: 117680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:15,449-Speed 2968.91 samples/sec   Loss 7.6666   LearningRate 0.0277   Epoch: 9   Global Step: 117690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:18,864-Speed 2999.06 samples/sec   Loss 7.7421   LearningRate 0.0277   Epoch: 9   Global Step: 117700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:22,276-Speed 3002.34 samples/sec   Loss 7.7686   LearningRate 0.0277   Epoch: 9   Global Step: 117710   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:25,744-Speed 2953.37 samples/sec   Loss 7.6411   LearningRate 0.0277   Epoch: 9   Global Step: 117720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:29,136-Speed 3019.59 samples/sec   Loss 7.7123   LearningRate 0.0277   Epoch: 9   Global Step: 117730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:51:32,529-Speed 3019.30 samples/sec   Loss 7.5190   LearningRate 0.0277   Epoch: 9   Global Step: 117740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:51:35,977-Speed 2970.37 samples/sec   Loss 7.6797   LearningRate 0.0277   Epoch: 9   Global Step: 117750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:39,320-Speed 3064.11 samples/sec   Loss 7.4757   LearningRate 0.0277   Epoch: 9   Global Step: 117760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:42,728-Speed 3005.57 samples/sec   Loss 7.6751   LearningRate 0.0277   Epoch: 9   Global Step: 117770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:46,174-Speed 2972.50 samples/sec   Loss 7.6468   LearningRate 0.0277   Epoch: 9   Global Step: 117780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:49,565-Speed 3020.02 samples/sec   Loss 7.4841   LearningRate 0.0277   Epoch: 9   Global Step: 117790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:52,965-Speed 3012.65 samples/sec   Loss 7.5317   LearningRate 0.0276   Epoch: 9   Global Step: 117800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:56,401-Speed 2981.23 samples/sec   Loss 7.4887   LearningRate 0.0276   Epoch: 9   Global Step: 117810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:51:59,808-Speed 3005.91 samples/sec   Loss 7.6333   LearningRate 0.0276   Epoch: 9   Global Step: 117820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:52:03,276-Speed 2954.34 samples/sec   Loss 7.7171   LearningRate 0.0276   Epoch: 9   Global Step: 117830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:52:06,642-Speed 3042.23 samples/sec   Loss 7.6890   LearningRate 0.0276   Epoch: 9   Global Step: 117840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:52:10,026-Speed 3027.27 samples/sec   Loss 7.5477   LearningRate 0.0276   Epoch: 9   Global Step: 117850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:52:13,389-Speed 3046.18 samples/sec   Loss 7.7169   LearningRate 0.0276   Epoch: 9   Global Step: 117860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:52:16,752-Speed 3045.47 samples/sec   Loss 7.6091   LearningRate 0.0276   Epoch: 9   Global Step: 117870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:52:20,124-Speed 3037.17 samples/sec   Loss 7.6126   LearningRate 0.0276   Epoch: 9   Global Step: 117880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:52:23,513-Speed 3023.07 samples/sec   Loss 7.6595   LearningRate 0.0276   Epoch: 9   Global Step: 117890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:52:26,948-Speed 2981.41 samples/sec   Loss 7.6168   LearningRate 0.0276   Epoch: 9   Global Step: 117900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:52:30,334-Speed 3025.74 samples/sec   Loss 7.6854   LearningRate 0.0276   Epoch: 9   Global Step: 117910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:52:33,693-Speed 3049.41 samples/sec   Loss 7.6595   LearningRate 0.0276   Epoch: 9   Global Step: 117920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:52:37,092-Speed 3013.12 samples/sec   Loss 7.4997   LearningRate 0.0276   Epoch: 9   Global Step: 117930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:52:40,485-Speed 3018.98 samples/sec   Loss 7.7680   LearningRate 0.0276   Epoch: 9   Global Step: 117940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:52:43,949-Speed 2956.76 samples/sec   Loss 7.4602   LearningRate 0.0276   Epoch: 9   Global Step: 117950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:52:47,384-Speed 2982.38 samples/sec   Loss 7.6457   LearningRate 0.0276   Epoch: 9   Global Step: 117960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:52:50,841-Speed 2962.64 samples/sec   Loss 7.6053   LearningRate 0.0276   Epoch: 9   Global Step: 117970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:52:54,244-Speed 3009.52 samples/sec   Loss 7.7623   LearningRate 0.0276   Epoch: 9   Global Step: 117980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:52:57,593-Speed 3059.42 samples/sec   Loss 7.6400   LearningRate 0.0276   Epoch: 9   Global Step: 117990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:53:01,000-Speed 3006.45 samples/sec   Loss 7.5207   LearningRate 0.0276   Epoch: 9   Global Step: 118000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:53:04,368-Speed 3041.48 samples/sec   Loss 7.6789   LearningRate 0.0276   Epoch: 9   Global Step: 118010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:53:07,729-Speed 3047.47 samples/sec   Loss 7.7738   LearningRate 0.0276   Epoch: 9   Global Step: 118020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:53:11,040-Speed 3094.25 samples/sec   Loss 7.4863   LearningRate 0.0275   Epoch: 9   Global Step: 118030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:53:14,394-Speed 3052.97 samples/sec   Loss 7.6985   LearningRate 0.0275   Epoch: 9   Global Step: 118040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:53:17,750-Speed 3052.29 samples/sec   Loss 7.6033   LearningRate 0.0275   Epoch: 9   Global Step: 118050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:53:21,165-Speed 2999.29 samples/sec   Loss 7.7232   LearningRate 0.0275   Epoch: 9   Global Step: 118060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:53:24,585-Speed 2995.18 samples/sec   Loss 7.6001   LearningRate 0.0275   Epoch: 9   Global Step: 118070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:53:27,947-Speed 3046.53 samples/sec   Loss 7.7101   LearningRate 0.0275   Epoch: 9   Global Step: 118080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:53:31,370-Speed 2991.94 samples/sec   Loss 7.6455   LearningRate 0.0275   Epoch: 9   Global Step: 118090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:53:34,722-Speed 3055.83 samples/sec   Loss 7.6072   LearningRate 0.0275   Epoch: 9   Global Step: 118100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:53:38,059-Speed 3070.07 samples/sec   Loss 7.5820   LearningRate 0.0275   Epoch: 9   Global Step: 118110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:53:41,526-Speed 2954.18 samples/sec   Loss 7.7655   LearningRate 0.0275   Epoch: 9   Global Step: 118120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:53:44,858-Speed 3074.93 samples/sec   Loss 7.6253   LearningRate 0.0275   Epoch: 9   Global Step: 118130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:53:48,231-Speed 3037.18 samples/sec   Loss 7.6709   LearningRate 0.0275   Epoch: 9   Global Step: 118140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:53:51,575-Speed 3063.23 samples/sec   Loss 7.4893   LearningRate 0.0275   Epoch: 9   Global Step: 118150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:53:55,039-Speed 2956.32 samples/sec   Loss 7.6088   LearningRate 0.0275   Epoch: 9   Global Step: 118160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:53:58,470-Speed 2985.70 samples/sec   Loss 7.5578   LearningRate 0.0275   Epoch: 9   Global Step: 118170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:54:01,874-Speed 3008.87 samples/sec   Loss 7.7160   LearningRate 0.0275   Epoch: 9   Global Step: 118180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:54:05,210-Speed 3070.42 samples/sec   Loss 7.5795   LearningRate 0.0275   Epoch: 9   Global Step: 118190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:54:08,638-Speed 2989.28 samples/sec   Loss 7.5978   LearningRate 0.0275   Epoch: 9   Global Step: 118200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:54:11,983-Speed 3062.14 samples/sec   Loss 7.6149   LearningRate 0.0275   Epoch: 9   Global Step: 118210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:54:15,332-Speed 3057.97 samples/sec   Loss 7.5084   LearningRate 0.0275   Epoch: 9   Global Step: 118220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:54:18,721-Speed 3022.33 samples/sec   Loss 7.5288   LearningRate 0.0275   Epoch: 9   Global Step: 118230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:54:22,084-Speed 3046.54 samples/sec   Loss 7.5794   LearningRate 0.0275   Epoch: 9   Global Step: 118240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:54:25,468-Speed 3026.04 samples/sec   Loss 7.6582   LearningRate 0.0275   Epoch: 9   Global Step: 118250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:54:28,887-Speed 2996.06 samples/sec   Loss 7.6240   LearningRate 0.0275   Epoch: 9   Global Step: 118260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:54:32,307-Speed 2994.90 samples/sec   Loss 7.6451   LearningRate 0.0274   Epoch: 9   Global Step: 118270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:54:35,679-Speed 3037.84 samples/sec   Loss 7.5923   LearningRate 0.0274   Epoch: 9   Global Step: 118280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:54:39,012-Speed 3073.11 samples/sec   Loss 7.4606   LearningRate 0.0274   Epoch: 9   Global Step: 118290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:54:42,479-Speed 2954.49 samples/sec   Loss 7.5463   LearningRate 0.0274   Epoch: 9   Global Step: 118300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:54:45,796-Speed 3087.89 samples/sec   Loss 7.6263   LearningRate 0.0274   Epoch: 9   Global Step: 118310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:54:49,188-Speed 3019.92 samples/sec   Loss 7.7147   LearningRate 0.0274   Epoch: 9   Global Step: 118320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:54:52,525-Speed 3069.34 samples/sec   Loss 7.6236   LearningRate 0.0274   Epoch: 9   Global Step: 118330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:54:55,921-Speed 3015.95 samples/sec   Loss 7.8210   LearningRate 0.0274   Epoch: 9   Global Step: 118340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:54:59,239-Speed 3086.92 samples/sec   Loss 7.6602   LearningRate 0.0274   Epoch: 9   Global Step: 118350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:55:02,554-Speed 3089.77 samples/sec   Loss 7.5522   LearningRate 0.0274   Epoch: 9   Global Step: 118360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:55:06,027-Speed 2949.08 samples/sec   Loss 7.5320   LearningRate 0.0274   Epoch: 9   Global Step: 118370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:55:09,466-Speed 2979.00 samples/sec   Loss 7.5636   LearningRate 0.0274   Epoch: 9   Global Step: 118380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:55:12,850-Speed 3027.07 samples/sec   Loss 7.5971   LearningRate 0.0274   Epoch: 9   Global Step: 118390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:55:16,252-Speed 3010.25 samples/sec   Loss 7.5989   LearningRate 0.0274   Epoch: 9   Global Step: 118400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:55:19,610-Speed 3050.66 samples/sec   Loss 7.6519   LearningRate 0.0274   Epoch: 9   Global Step: 118410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:55:23,092-Speed 2942.71 samples/sec   Loss 7.6493   LearningRate 0.0274   Epoch: 9   Global Step: 118420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:55:26,468-Speed 3034.01 samples/sec   Loss 7.6327   LearningRate 0.0274   Epoch: 9   Global Step: 118430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:55:29,805-Speed 3069.19 samples/sec   Loss 7.5701   LearningRate 0.0274   Epoch: 9   Global Step: 118440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:55:33,147-Speed 3065.71 samples/sec   Loss 7.5165   LearningRate 0.0274   Epoch: 9   Global Step: 118450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:55:36,540-Speed 3018.70 samples/sec   Loss 7.5410   LearningRate 0.0274   Epoch: 9   Global Step: 118460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:55:39,859-Speed 3085.75 samples/sec   Loss 7.5837   LearningRate 0.0274   Epoch: 9   Global Step: 118470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:55:43,189-Speed 3076.14 samples/sec   Loss 7.6447   LearningRate 0.0274   Epoch: 9   Global Step: 118480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:55:46,566-Speed 3033.98 samples/sec   Loss 7.5365   LearningRate 0.0274   Epoch: 9   Global Step: 118490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:55:50,005-Speed 2978.29 samples/sec   Loss 7.5569   LearningRate 0.0274   Epoch: 9   Global Step: 118500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:55:53,400-Speed 3016.61 samples/sec   Loss 7.5844   LearningRate 0.0273   Epoch: 9   Global Step: 118510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:55:56,815-Speed 2999.46 samples/sec   Loss 7.5382   LearningRate 0.0273   Epoch: 9   Global Step: 118520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:56:00,223-Speed 3005.80 samples/sec   Loss 7.7465   LearningRate 0.0273   Epoch: 9   Global Step: 118530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:56:03,558-Speed 3071.18 samples/sec   Loss 7.7561   LearningRate 0.0273   Epoch: 9   Global Step: 118540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:56:06,914-Speed 3052.26 samples/sec   Loss 7.6537   LearningRate 0.0273   Epoch: 9   Global Step: 118550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:56:10,253-Speed 3068.18 samples/sec   Loss 7.7011   LearningRate 0.0273   Epoch: 9   Global Step: 118560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:56:13,604-Speed 3055.90 samples/sec   Loss 7.6981   LearningRate 0.0273   Epoch: 9   Global Step: 118570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:56:16,964-Speed 3048.08 samples/sec   Loss 7.5752   LearningRate 0.0273   Epoch: 9   Global Step: 118580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:56:20,317-Speed 3055.64 samples/sec   Loss 7.6200   LearningRate 0.0273   Epoch: 9   Global Step: 118590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:56:23,645-Speed 3077.47 samples/sec   Loss 7.5940   LearningRate 0.0273   Epoch: 9   Global Step: 118600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:56:27,006-Speed 3047.84 samples/sec   Loss 7.5614   LearningRate 0.0273   Epoch: 9   Global Step: 118610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:56:30,387-Speed 3029.48 samples/sec   Loss 7.5195   LearningRate 0.0273   Epoch: 9   Global Step: 118620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:56:33,718-Speed 3074.95 samples/sec   Loss 7.6294   LearningRate 0.0273   Epoch: 9   Global Step: 118630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:56:37,097-Speed 3030.95 samples/sec   Loss 7.6419   LearningRate 0.0273   Epoch: 9   Global Step: 118640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:56:40,414-Speed 3088.36 samples/sec   Loss 7.5894   LearningRate 0.0273   Epoch: 9   Global Step: 118650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:56:43,723-Speed 3095.57 samples/sec   Loss 7.7185   LearningRate 0.0273   Epoch: 9   Global Step: 118660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:56:47,075-Speed 3055.05 samples/sec   Loss 7.5643   LearningRate 0.0273   Epoch: 9   Global Step: 118670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:56:50,428-Speed 3055.50 samples/sec   Loss 7.6278   LearningRate 0.0273   Epoch: 9   Global Step: 118680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:56:53,817-Speed 3023.15 samples/sec   Loss 7.5531   LearningRate 0.0273   Epoch: 9   Global Step: 118690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:56:57,203-Speed 3024.96 samples/sec   Loss 7.6530   LearningRate 0.0273   Epoch: 9   Global Step: 118700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:57:00,593-Speed 3020.98 samples/sec   Loss 7.5282   LearningRate 0.0273   Epoch: 9   Global Step: 118710   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:57:03,944-Speed 3056.82 samples/sec   Loss 7.7550   LearningRate 0.0273   Epoch: 9   Global Step: 118720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:57:07,311-Speed 3041.90 samples/sec   Loss 7.6462   LearningRate 0.0273   Epoch: 9   Global Step: 118730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:57:10,714-Speed 3010.67 samples/sec   Loss 7.6458   LearningRate 0.0273   Epoch: 9   Global Step: 118740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:57:14,052-Speed 3068.11 samples/sec   Loss 7.5665   LearningRate 0.0272   Epoch: 9   Global Step: 118750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:57:17,361-Speed 3095.70 samples/sec   Loss 7.6641   LearningRate 0.0272   Epoch: 9   Global Step: 118760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:57:20,824-Speed 2957.66 samples/sec   Loss 7.6128   LearningRate 0.0272   Epoch: 9   Global Step: 118770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:57:24,268-Speed 2974.11 samples/sec   Loss 7.6219   LearningRate 0.0272   Epoch: 9   Global Step: 118780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 12:57:27,712-Speed 2974.79 samples/sec   Loss 7.6800   LearningRate 0.0272   Epoch: 9   Global Step: 118790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:57:31,128-Speed 2997.66 samples/sec   Loss 7.5929   LearningRate 0.0272   Epoch: 9   Global Step: 118800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:57:34,508-Speed 3030.23 samples/sec   Loss 7.6978   LearningRate 0.0272   Epoch: 9   Global Step: 118810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:57:37,888-Speed 3031.23 samples/sec   Loss 7.6264   LearningRate 0.0272   Epoch: 9   Global Step: 118820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:57:41,208-Speed 3084.58 samples/sec   Loss 7.5286   LearningRate 0.0272   Epoch: 9   Global Step: 118830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:57:44,639-Speed 2985.56 samples/sec   Loss 7.5743   LearningRate 0.0272   Epoch: 9   Global Step: 118840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:57:48,019-Speed 3030.79 samples/sec   Loss 7.5981   LearningRate 0.0272   Epoch: 9   Global Step: 118850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:57:51,396-Speed 3033.25 samples/sec   Loss 7.5788   LearningRate 0.0272   Epoch: 9   Global Step: 118860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:57:54,794-Speed 3014.43 samples/sec   Loss 7.6045   LearningRate 0.0272   Epoch: 9   Global Step: 118870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:57:58,150-Speed 3052.51 samples/sec   Loss 7.5078   LearningRate 0.0272   Epoch: 9   Global Step: 118880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:58:01,515-Speed 3043.68 samples/sec   Loss 7.6536   LearningRate 0.0272   Epoch: 9   Global Step: 118890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:58:04,854-Speed 3067.83 samples/sec   Loss 7.6591   LearningRate 0.0272   Epoch: 9   Global Step: 118900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:58:08,167-Speed 3091.60 samples/sec   Loss 7.6314   LearningRate 0.0272   Epoch: 9   Global Step: 118910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:58:11,571-Speed 3009.39 samples/sec   Loss 7.5522   LearningRate 0.0272   Epoch: 9   Global Step: 118920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:58:15,022-Speed 2967.35 samples/sec   Loss 7.4822   LearningRate 0.0272   Epoch: 9   Global Step: 118930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:58:18,428-Speed 3007.85 samples/sec   Loss 7.5454   LearningRate 0.0272   Epoch: 9   Global Step: 118940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:58:21,810-Speed 3028.53 samples/sec   Loss 7.5817   LearningRate 0.0272   Epoch: 9   Global Step: 118950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:58:25,133-Speed 3082.09 samples/sec   Loss 7.5560   LearningRate 0.0272   Epoch: 9   Global Step: 118960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 12:58:28,439-Speed 3098.66 samples/sec   Loss 7.6358   LearningRate 0.0272   Epoch: 9   Global Step: 118970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:58:31,772-Speed 3073.04 samples/sec   Loss 7.6478   LearningRate 0.0271   Epoch: 9   Global Step: 118980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:58:35,133-Speed 3047.76 samples/sec   Loss 7.4202   LearningRate 0.0271   Epoch: 9   Global Step: 118990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:58:38,568-Speed 2981.68 samples/sec   Loss 7.5984   LearningRate 0.0271   Epoch: 9   Global Step: 119000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:58:42,017-Speed 2970.32 samples/sec   Loss 7.5262   LearningRate 0.0271   Epoch: 9   Global Step: 119010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:58:45,382-Speed 3043.19 samples/sec   Loss 7.6818   LearningRate 0.0271   Epoch: 9   Global Step: 119020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:58:48,749-Speed 3042.88 samples/sec   Loss 7.5507   LearningRate 0.0271   Epoch: 9   Global Step: 119030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:58:52,165-Speed 2997.81 samples/sec   Loss 7.5947   LearningRate 0.0271   Epoch: 9   Global Step: 119040   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 12:58:55,624-Speed 2961.64 samples/sec   Loss 7.5916   LearningRate 0.0271   Epoch: 9   Global Step: 119050   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 12:58:59,000-Speed 3033.50 samples/sec   Loss 7.6106   LearningRate 0.0271   Epoch: 9   Global Step: 119060   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 12:59:02,369-Speed 3040.64 samples/sec   Loss 7.5770   LearningRate 0.0271   Epoch: 9   Global Step: 119070   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 12:59:05,726-Speed 3051.30 samples/sec   Loss 7.6353   LearningRate 0.0271   Epoch: 9   Global Step: 119080   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 12:59:09,158-Speed 2984.28 samples/sec   Loss 7.5589   LearningRate 0.0271   Epoch: 9   Global Step: 119090   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 12:59:12,511-Speed 3054.81 samples/sec   Loss 7.6239   LearningRate 0.0271   Epoch: 9   Global Step: 119100   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 12:59:15,889-Speed 3032.59 samples/sec   Loss 7.6145   LearningRate 0.0271   Epoch: 9   Global Step: 119110   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 12:59:19,271-Speed 3028.64 samples/sec   Loss 7.5840   LearningRate 0.0271   Epoch: 9   Global Step: 119120   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 12:59:22,734-Speed 2957.97 samples/sec   Loss 7.7567   LearningRate 0.0271   Epoch: 9   Global Step: 119130   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 12:59:26,073-Speed 3067.03 samples/sec   Loss 7.6430   LearningRate 0.0271   Epoch: 9   Global Step: 119140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:59:29,460-Speed 3024.68 samples/sec   Loss 7.6020   LearningRate 0.0271   Epoch: 9   Global Step: 119150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:59:32,915-Speed 2964.67 samples/sec   Loss 7.4362   LearningRate 0.0271   Epoch: 9   Global Step: 119160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:59:36,343-Speed 2988.11 samples/sec   Loss 7.5708   LearningRate 0.0271   Epoch: 9   Global Step: 119170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:59:39,756-Speed 3000.96 samples/sec   Loss 7.5104   LearningRate 0.0271   Epoch: 9   Global Step: 119180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:59:43,164-Speed 3005.76 samples/sec   Loss 7.7098   LearningRate 0.0271   Epoch: 9   Global Step: 119190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:59:46,544-Speed 3030.37 samples/sec   Loss 7.5240   LearningRate 0.0271   Epoch: 9   Global Step: 119200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:59:49,954-Speed 3004.21 samples/sec   Loss 7.6534   LearningRate 0.0271   Epoch: 9   Global Step: 119210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:59:53,271-Speed 3087.45 samples/sec   Loss 7.5094   LearningRate 0.0270   Epoch: 9   Global Step: 119220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 12:59:56,635-Speed 3044.73 samples/sec   Loss 7.6250   LearningRate 0.0270   Epoch: 9   Global Step: 119230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:00:00,023-Speed 3023.29 samples/sec   Loss 7.5294   LearningRate 0.0270   Epoch: 9   Global Step: 119240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:00:03,390-Speed 3042.25 samples/sec   Loss 7.4600   LearningRate 0.0270   Epoch: 9   Global Step: 119250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:00:06,760-Speed 3040.58 samples/sec   Loss 7.6760   LearningRate 0.0270   Epoch: 9   Global Step: 119260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:00:10,149-Speed 3022.50 samples/sec   Loss 7.6479   LearningRate 0.0270   Epoch: 9   Global Step: 119270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:00:13,603-Speed 2965.02 samples/sec   Loss 7.5456   LearningRate 0.0270   Epoch: 9   Global Step: 119280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:00:17,030-Speed 2989.45 samples/sec   Loss 7.6312   LearningRate 0.0270   Epoch: 9   Global Step: 119290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:00:20,380-Speed 3057.78 samples/sec   Loss 7.6820   LearningRate 0.0270   Epoch: 9   Global Step: 119300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:00:23,741-Speed 3046.90 samples/sec   Loss 7.5219   LearningRate 0.0270   Epoch: 9   Global Step: 119310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:00:27,103-Speed 3047.22 samples/sec   Loss 7.5402   LearningRate 0.0270   Epoch: 9   Global Step: 119320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:00:30,565-Speed 2958.88 samples/sec   Loss 7.5652   LearningRate 0.0270   Epoch: 9   Global Step: 119330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:00:33,930-Speed 3043.54 samples/sec   Loss 7.5639   LearningRate 0.0270   Epoch: 9   Global Step: 119340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:00:37,323-Speed 3019.22 samples/sec   Loss 7.5793   LearningRate 0.0270   Epoch: 9   Global Step: 119350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:00:40,666-Speed 3063.89 samples/sec   Loss 7.6652   LearningRate 0.0270   Epoch: 9   Global Step: 119360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:00:44,015-Speed 3058.00 samples/sec   Loss 7.6793   LearningRate 0.0270   Epoch: 9   Global Step: 119370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:00:47,368-Speed 3054.96 samples/sec   Loss 7.6506   LearningRate 0.0270   Epoch: 9   Global Step: 119380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:00:50,778-Speed 3004.14 samples/sec   Loss 7.6186   LearningRate 0.0270   Epoch: 9   Global Step: 119390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:00:54,156-Speed 3031.71 samples/sec   Loss 7.5650   LearningRate 0.0270   Epoch: 9   Global Step: 119400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:00:57,519-Speed 3046.02 samples/sec   Loss 7.6451   LearningRate 0.0270   Epoch: 9   Global Step: 119410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:01:00,907-Speed 3023.18 samples/sec   Loss 7.5531   LearningRate 0.0270   Epoch: 9   Global Step: 119420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:01:04,258-Speed 3057.37 samples/sec   Loss 7.5176   LearningRate 0.0270   Epoch: 9   Global Step: 119430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:01:07,627-Speed 3039.91 samples/sec   Loss 7.4437   LearningRate 0.0270   Epoch: 9   Global Step: 119440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:01:11,035-Speed 3005.79 samples/sec   Loss 7.5614   LearningRate 0.0270   Epoch: 9   Global Step: 119450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:01:14,447-Speed 3002.59 samples/sec   Loss 7.4502   LearningRate 0.0269   Epoch: 9   Global Step: 119460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:01:17,763-Speed 3088.30 samples/sec   Loss 7.5495   LearningRate 0.0269   Epoch: 9   Global Step: 119470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:01:21,180-Speed 2997.90 samples/sec   Loss 7.6460   LearningRate 0.0269   Epoch: 9   Global Step: 119480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:01:24,564-Speed 3026.49 samples/sec   Loss 7.5743   LearningRate 0.0269   Epoch: 9   Global Step: 119490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:01:27,933-Speed 3040.77 samples/sec   Loss 7.5330   LearningRate 0.0269   Epoch: 9   Global Step: 119500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:01:31,312-Speed 3031.08 samples/sec   Loss 7.4597   LearningRate 0.0269   Epoch: 9   Global Step: 119510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:01:34,771-Speed 2960.94 samples/sec   Loss 7.5836   LearningRate 0.0269   Epoch: 9   Global Step: 119520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:01:38,126-Speed 3052.99 samples/sec   Loss 7.4307   LearningRate 0.0269   Epoch: 9   Global Step: 119530   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:01:41,484-Speed 3051.20 samples/sec   Loss 7.5140   LearningRate 0.0269   Epoch: 9   Global Step: 119540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:01:44,834-Speed 3057.37 samples/sec   Loss 7.6195   LearningRate 0.0269   Epoch: 9   Global Step: 119550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:01:48,238-Speed 3009.04 samples/sec   Loss 7.6064   LearningRate 0.0269   Epoch: 9   Global Step: 119560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:01:51,691-Speed 2966.05 samples/sec   Loss 7.5532   LearningRate 0.0269   Epoch: 9   Global Step: 119570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:01:55,121-Speed 2986.41 samples/sec   Loss 7.6393   LearningRate 0.0269   Epoch: 9   Global Step: 119580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:01:58,541-Speed 2995.19 samples/sec   Loss 7.6773   LearningRate 0.0269   Epoch: 9   Global Step: 119590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:01,949-Speed 3005.25 samples/sec   Loss 7.5478   LearningRate 0.0269   Epoch: 9   Global Step: 119600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:05,276-Speed 3078.89 samples/sec   Loss 7.5775   LearningRate 0.0269   Epoch: 9   Global Step: 119610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:08,613-Speed 3069.09 samples/sec   Loss 7.5452   LearningRate 0.0269   Epoch: 9   Global Step: 119620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:11,945-Speed 3074.63 samples/sec   Loss 7.5533   LearningRate 0.0269   Epoch: 9   Global Step: 119630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:15,323-Speed 3032.60 samples/sec   Loss 7.5817   LearningRate 0.0269   Epoch: 9   Global Step: 119640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:18,706-Speed 3027.86 samples/sec   Loss 7.6127   LearningRate 0.0269   Epoch: 9   Global Step: 119650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:22,088-Speed 3028.70 samples/sec   Loss 7.5665   LearningRate 0.0269   Epoch: 9   Global Step: 119660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:25,451-Speed 3046.27 samples/sec   Loss 7.5634   LearningRate 0.0269   Epoch: 9   Global Step: 119670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:28,833-Speed 3028.36 samples/sec   Loss 7.7119   LearningRate 0.0269   Epoch: 9   Global Step: 119680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:32,169-Speed 3070.56 samples/sec   Loss 7.5836   LearningRate 0.0269   Epoch: 9   Global Step: 119690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:35,557-Speed 3023.27 samples/sec   Loss 7.6129   LearningRate 0.0268   Epoch: 9   Global Step: 119700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:38,967-Speed 3003.22 samples/sec   Loss 7.6266   LearningRate 0.0268   Epoch: 9   Global Step: 119710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:42,321-Speed 3054.59 samples/sec   Loss 7.4703   LearningRate 0.0268   Epoch: 9   Global Step: 119720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:45,688-Speed 3041.79 samples/sec   Loss 7.6874   LearningRate 0.0268   Epoch: 9   Global Step: 119730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:49,064-Speed 3034.18 samples/sec   Loss 7.6840   LearningRate 0.0268   Epoch: 9   Global Step: 119740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:02:52,404-Speed 3066.38 samples/sec   Loss 7.4761   LearningRate 0.0268   Epoch: 9   Global Step: 119750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:02:55,786-Speed 3029.06 samples/sec   Loss 7.6535   LearningRate 0.0268   Epoch: 9   Global Step: 119760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:02:59,227-Speed 2976.90 samples/sec   Loss 7.6884   LearningRate 0.0268   Epoch: 9   Global Step: 119770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:02,624-Speed 3015.24 samples/sec   Loss 7.4887   LearningRate 0.0268   Epoch: 9   Global Step: 119780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:06,015-Speed 3020.36 samples/sec   Loss 7.6104   LearningRate 0.0268   Epoch: 9   Global Step: 119790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:09,387-Speed 3037.46 samples/sec   Loss 7.5930   LearningRate 0.0268   Epoch: 9   Global Step: 119800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:12,759-Speed 3037.70 samples/sec   Loss 7.7115   LearningRate 0.0268   Epoch: 9   Global Step: 119810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:16,080-Speed 3084.05 samples/sec   Loss 7.5885   LearningRate 0.0268   Epoch: 9   Global Step: 119820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:19,428-Speed 3059.70 samples/sec   Loss 7.4473   LearningRate 0.0268   Epoch: 9   Global Step: 119830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:22,812-Speed 3026.90 samples/sec   Loss 7.5861   LearningRate 0.0268   Epoch: 9   Global Step: 119840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:26,184-Speed 3037.26 samples/sec   Loss 7.4608   LearningRate 0.0268   Epoch: 9   Global Step: 119850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:03:29,534-Speed 3057.86 samples/sec   Loss 7.5567   LearningRate 0.0268   Epoch: 9   Global Step: 119860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:32,927-Speed 3018.82 samples/sec   Loss 7.5656   LearningRate 0.0268   Epoch: 9   Global Step: 119870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:36,284-Speed 3051.49 samples/sec   Loss 7.5748   LearningRate 0.0268   Epoch: 9   Global Step: 119880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:39,666-Speed 3028.12 samples/sec   Loss 7.5294   LearningRate 0.0268   Epoch: 9   Global Step: 119890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:43,038-Speed 3037.84 samples/sec   Loss 7.4209   LearningRate 0.0268   Epoch: 9   Global Step: 119900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:46,376-Speed 3068.58 samples/sec   Loss 7.4964   LearningRate 0.0268   Epoch: 9   Global Step: 119910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:49,767-Speed 3020.42 samples/sec   Loss 7.5258   LearningRate 0.0268   Epoch: 9   Global Step: 119920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:53,095-Speed 3078.35 samples/sec   Loss 7.4511   LearningRate 0.0268   Epoch: 9   Global Step: 119930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:56,465-Speed 3039.31 samples/sec   Loss 7.4799   LearningRate 0.0267   Epoch: 9   Global Step: 119940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:03:59,861-Speed 3015.75 samples/sec   Loss 7.6491   LearningRate 0.0267   Epoch: 9   Global Step: 119950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:04:03,212-Speed 3057.24 samples/sec   Loss 7.5803   LearningRate 0.0267   Epoch: 9   Global Step: 119960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:04:06,581-Speed 3039.69 samples/sec   Loss 7.4944   LearningRate 0.0267   Epoch: 9   Global Step: 119970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:04:09,936-Speed 3053.16 samples/sec   Loss 7.5834   LearningRate 0.0267   Epoch: 9   Global Step: 119980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:04:13,356-Speed 2995.01 samples/sec   Loss 7.6002   LearningRate 0.0267   Epoch: 9   Global Step: 119990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:04:16,768-Speed 3002.18 samples/sec   Loss 7.4197   LearningRate 0.0267   Epoch: 9   Global Step: 120000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:04:20,174-Speed 3007.07 samples/sec   Loss 7.4269   LearningRate 0.0267   Epoch: 9   Global Step: 120010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:04:23,543-Speed 3040.33 samples/sec   Loss 7.5622   LearningRate 0.0267   Epoch: 9   Global Step: 120020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:04:26,921-Speed 3032.40 samples/sec   Loss 7.5197   LearningRate 0.0267   Epoch: 9   Global Step: 120030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:04:30,249-Speed 3078.22 samples/sec   Loss 7.4830   LearningRate 0.0267   Epoch: 9   Global Step: 120040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:04:33,682-Speed 2983.16 samples/sec   Loss 7.6461   LearningRate 0.0267   Epoch: 9   Global Step: 120050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:04:37,038-Speed 3051.71 samples/sec   Loss 7.5885   LearningRate 0.0267   Epoch: 9   Global Step: 120060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:04:40,445-Speed 3007.06 samples/sec   Loss 7.4667   LearningRate 0.0267   Epoch: 9   Global Step: 120070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:04:43,817-Speed 3037.72 samples/sec   Loss 7.4490   LearningRate 0.0267   Epoch: 9   Global Step: 120080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:04:47,188-Speed 3038.02 samples/sec   Loss 7.4519   LearningRate 0.0267   Epoch: 9   Global Step: 120090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:04:50,578-Speed 3021.50 samples/sec   Loss 7.5686   LearningRate 0.0267   Epoch: 9   Global Step: 120100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:04:53,977-Speed 3014.08 samples/sec   Loss 7.5802   LearningRate 0.0267   Epoch: 9   Global Step: 120110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:04:57,404-Speed 2988.86 samples/sec   Loss 7.5683   LearningRate 0.0267   Epoch: 9   Global Step: 120120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:05:00,749-Speed 3062.31 samples/sec   Loss 7.4613   LearningRate 0.0267   Epoch: 9   Global Step: 120130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:05:04,192-Speed 2974.30 samples/sec   Loss 7.6408   LearningRate 0.0267   Epoch: 9   Global Step: 120140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:05:07,539-Speed 3060.89 samples/sec   Loss 7.4451   LearningRate 0.0267   Epoch: 9   Global Step: 120150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:05:10,898-Speed 3049.12 samples/sec   Loss 7.7112   LearningRate 0.0267   Epoch: 9   Global Step: 120160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:05:14,209-Speed 3093.52 samples/sec   Loss 7.5948   LearningRate 0.0267   Epoch: 9   Global Step: 120170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:05:17,574-Speed 3043.73 samples/sec   Loss 7.5236   LearningRate 0.0266   Epoch: 9   Global Step: 120180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:05:20,994-Speed 2995.63 samples/sec   Loss 7.3801   LearningRate 0.0266   Epoch: 9   Global Step: 120190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:05:24,383-Speed 3021.62 samples/sec   Loss 7.5062   LearningRate 0.0266   Epoch: 9   Global Step: 120200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:05:27,736-Speed 3055.08 samples/sec   Loss 7.4839   LearningRate 0.0266   Epoch: 9   Global Step: 120210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:05:31,091-Speed 3053.35 samples/sec   Loss 7.4254   LearningRate 0.0266   Epoch: 9   Global Step: 120220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:05:34,464-Speed 3036.85 samples/sec   Loss 7.5951   LearningRate 0.0266   Epoch: 9   Global Step: 120230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:05:37,827-Speed 3045.42 samples/sec   Loss 7.6911   LearningRate 0.0266   Epoch: 9   Global Step: 120240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:05:41,270-Speed 2975.19 samples/sec   Loss 7.4662   LearningRate 0.0266   Epoch: 9   Global Step: 120250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:05:44,636-Speed 3042.88 samples/sec   Loss 7.5039   LearningRate 0.0266   Epoch: 9   Global Step: 120260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:05:48,041-Speed 3008.31 samples/sec   Loss 7.5131   LearningRate 0.0266   Epoch: 9   Global Step: 120270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:05:51,489-Speed 2970.84 samples/sec   Loss 7.5580   LearningRate 0.0266   Epoch: 9   Global Step: 120280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:05:54,837-Speed 3058.39 samples/sec   Loss 7.4277   LearningRate 0.0266   Epoch: 9   Global Step: 120290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:05:58,219-Speed 3028.82 samples/sec   Loss 7.5269   LearningRate 0.0266   Epoch: 9   Global Step: 120300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:06:01,663-Speed 2974.81 samples/sec   Loss 7.4975   LearningRate 0.0266   Epoch: 9   Global Step: 120310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:06:04,967-Speed 3099.06 samples/sec   Loss 7.4333   LearningRate 0.0266   Epoch: 9   Global Step: 120320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:06:08,340-Speed 3037.39 samples/sec   Loss 7.5262   LearningRate 0.0266   Epoch: 9   Global Step: 120330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:06:11,769-Speed 2986.79 samples/sec   Loss 7.4192   LearningRate 0.0266   Epoch: 9   Global Step: 120340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:06:15,159-Speed 3021.21 samples/sec   Loss 7.4288   LearningRate 0.0266   Epoch: 9   Global Step: 120350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:06:18,597-Speed 2979.86 samples/sec   Loss 7.5796   LearningRate 0.0266   Epoch: 9   Global Step: 120360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:06:21,939-Speed 3064.41 samples/sec   Loss 7.5737   LearningRate 0.0266   Epoch: 9   Global Step: 120370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:06:25,257-Speed 3086.97 samples/sec   Loss 7.4685   LearningRate 0.0266   Epoch: 9   Global Step: 120380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:06:28,649-Speed 3020.08 samples/sec   Loss 7.4935   LearningRate 0.0266   Epoch: 9   Global Step: 120390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:06:32,038-Speed 3021.93 samples/sec   Loss 7.5123   LearningRate 0.0266   Epoch: 9   Global Step: 120400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:06:35,391-Speed 3054.96 samples/sec   Loss 7.4976   LearningRate 0.0266   Epoch: 9   Global Step: 120410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:06:38,774-Speed 3027.88 samples/sec   Loss 7.5744   LearningRate 0.0265   Epoch: 9   Global Step: 120420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:06:42,172-Speed 3014.24 samples/sec   Loss 7.5634   LearningRate 0.0265   Epoch: 9   Global Step: 120430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:06:45,595-Speed 2992.61 samples/sec   Loss 7.5963   LearningRate 0.0265   Epoch: 9   Global Step: 120440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:06:49,030-Speed 2981.52 samples/sec   Loss 7.5079   LearningRate 0.0265   Epoch: 9   Global Step: 120450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:06:52,493-Speed 2958.05 samples/sec   Loss 7.5071   LearningRate 0.0265   Epoch: 9   Global Step: 120460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:06:55,846-Speed 3055.31 samples/sec   Loss 7.5291   LearningRate 0.0265   Epoch: 9   Global Step: 120470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:06:59,245-Speed 3013.70 samples/sec   Loss 7.5113   LearningRate 0.0265   Epoch: 9   Global Step: 120480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:07:02,623-Speed 3032.05 samples/sec   Loss 7.5392   LearningRate 0.0265   Epoch: 9   Global Step: 120490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:07:06,076-Speed 2966.38 samples/sec   Loss 7.5819   LearningRate 0.0265   Epoch: 9   Global Step: 120500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:07:09,468-Speed 3019.38 samples/sec   Loss 7.5787   LearningRate 0.0265   Epoch: 9   Global Step: 120510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:07:12,912-Speed 2974.53 samples/sec   Loss 7.5808   LearningRate 0.0265   Epoch: 9   Global Step: 120520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:07:16,267-Speed 3052.99 samples/sec   Loss 7.6522   LearningRate 0.0265   Epoch: 9   Global Step: 120530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:07:19,661-Speed 3018.38 samples/sec   Loss 7.4786   LearningRate 0.0265   Epoch: 9   Global Step: 120540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:07:23,034-Speed 3036.17 samples/sec   Loss 7.4607   LearningRate 0.0265   Epoch: 9   Global Step: 120550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:07:26,389-Speed 3053.38 samples/sec   Loss 7.5221   LearningRate 0.0265   Epoch: 9   Global Step: 120560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:07:29,765-Speed 3034.24 samples/sec   Loss 7.3870   LearningRate 0.0265   Epoch: 9   Global Step: 120570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:07:33,124-Speed 3049.18 samples/sec   Loss 7.4908   LearningRate 0.0265   Epoch: 9   Global Step: 120580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:07:36,555-Speed 2984.66 samples/sec   Loss 7.4504   LearningRate 0.0265   Epoch: 9   Global Step: 120590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:07:39,947-Speed 3019.88 samples/sec   Loss 7.4802   LearningRate 0.0265   Epoch: 9   Global Step: 120600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:07:43,349-Speed 3011.29 samples/sec   Loss 7.5588   LearningRate 0.0265   Epoch: 9   Global Step: 120610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:07:46,872-Speed 2907.13 samples/sec   Loss 7.3641   LearningRate 0.0265   Epoch: 9   Global Step: 120620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:07:50,205-Speed 3072.69 samples/sec   Loss 7.4879   LearningRate 0.0265   Epoch: 9   Global Step: 120630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:07:53,553-Speed 3059.55 samples/sec   Loss 7.6514   LearningRate 0.0265   Epoch: 9   Global Step: 120640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:07:56,927-Speed 3035.69 samples/sec   Loss 7.5151   LearningRate 0.0265   Epoch: 9   Global Step: 120650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:08:00,338-Speed 3003.24 samples/sec   Loss 7.4638   LearningRate 0.0264   Epoch: 9   Global Step: 120660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:08:03,707-Speed 3040.17 samples/sec   Loss 7.4871   LearningRate 0.0264   Epoch: 9   Global Step: 120670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:08:07,098-Speed 3019.83 samples/sec   Loss 7.3654   LearningRate 0.0264   Epoch: 9   Global Step: 120680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:08:10,466-Speed 3042.01 samples/sec   Loss 7.4451   LearningRate 0.0264   Epoch: 9   Global Step: 120690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:08:13,843-Speed 3033.25 samples/sec   Loss 7.5241   LearningRate 0.0264   Epoch: 9   Global Step: 120700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:08:17,222-Speed 3030.48 samples/sec   Loss 7.4685   LearningRate 0.0264   Epoch: 9   Global Step: 120710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:08:20,576-Speed 3054.74 samples/sec   Loss 7.5651   LearningRate 0.0264   Epoch: 9   Global Step: 120720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:08:23,910-Speed 3071.46 samples/sec   Loss 7.5038   LearningRate 0.0264   Epoch: 9   Global Step: 120730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:08:27,400-Speed 2935.15 samples/sec   Loss 7.5174   LearningRate 0.0264   Epoch: 9   Global Step: 120740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:08:30,821-Speed 2993.41 samples/sec   Loss 7.5125   LearningRate 0.0264   Epoch: 9   Global Step: 120750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:08:34,206-Speed 3026.21 samples/sec   Loss 7.4496   LearningRate 0.0264   Epoch: 9   Global Step: 120760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:08:37,532-Speed 3079.93 samples/sec   Loss 7.4812   LearningRate 0.0264   Epoch: 9   Global Step: 120770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:08:40,885-Speed 3054.59 samples/sec   Loss 7.4406   LearningRate 0.0264   Epoch: 9   Global Step: 120780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:08:44,213-Speed 3078.11 samples/sec   Loss 7.6511   LearningRate 0.0264   Epoch: 9   Global Step: 120790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:08:47,554-Speed 3065.64 samples/sec   Loss 7.4651   LearningRate 0.0264   Epoch: 9   Global Step: 120800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:08:50,993-Speed 2978.47 samples/sec   Loss 7.4636   LearningRate 0.0264   Epoch: 9   Global Step: 120810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:08:54,391-Speed 3014.30 samples/sec   Loss 7.5041   LearningRate 0.0264   Epoch: 9   Global Step: 120820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:08:57,748-Speed 3051.41 samples/sec   Loss 7.5855   LearningRate 0.0264   Epoch: 9   Global Step: 120830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:09:01,149-Speed 3011.78 samples/sec   Loss 7.5056   LearningRate 0.0264   Epoch: 9   Global Step: 120840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:09:04,517-Speed 3041.12 samples/sec   Loss 7.5257   LearningRate 0.0264   Epoch: 9   Global Step: 120850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:09:07,933-Speed 2998.82 samples/sec   Loss 7.4678   LearningRate 0.0264   Epoch: 9   Global Step: 120860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:09:11,257-Speed 3081.03 samples/sec   Loss 7.4252   LearningRate 0.0264   Epoch: 9   Global Step: 120870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:09:14,639-Speed 3028.81 samples/sec   Loss 7.5718   LearningRate 0.0264   Epoch: 9   Global Step: 120880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:09:17,955-Speed 3089.45 samples/sec   Loss 7.5012   LearningRate 0.0264   Epoch: 9   Global Step: 120890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:09:21,260-Speed 3099.11 samples/sec   Loss 7.5694   LearningRate 0.0264   Epoch: 9   Global Step: 120900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:09:24,629-Speed 3040.32 samples/sec   Loss 7.3880   LearningRate 0.0263   Epoch: 9   Global Step: 120910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:09:28,069-Speed 2977.12 samples/sec   Loss 7.4762   LearningRate 0.0263   Epoch: 9   Global Step: 120920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:09:31,475-Speed 3007.87 samples/sec   Loss 7.5407   LearningRate 0.0263   Epoch: 9   Global Step: 120930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:09:34,852-Speed 3033.26 samples/sec   Loss 7.5118   LearningRate 0.0263   Epoch: 9   Global Step: 120940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:09:38,330-Speed 2944.69 samples/sec   Loss 7.4993   LearningRate 0.0263   Epoch: 9   Global Step: 120950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:09:41,754-Speed 2991.60 samples/sec   Loss 7.4838   LearningRate 0.0263   Epoch: 9   Global Step: 120960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:09:45,134-Speed 3030.51 samples/sec   Loss 7.4238   LearningRate 0.0263   Epoch: 9   Global Step: 120970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:09:48,502-Speed 3041.37 samples/sec   Loss 7.4448   LearningRate 0.0263   Epoch: 9   Global Step: 120980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:09:51,880-Speed 3032.22 samples/sec   Loss 7.5185   LearningRate 0.0263   Epoch: 9   Global Step: 120990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:09:55,294-Speed 3000.26 samples/sec   Loss 7.5407   LearningRate 0.0263   Epoch: 9   Global Step: 121000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:09:58,641-Speed 3060.23 samples/sec   Loss 7.4593   LearningRate 0.0263   Epoch: 9   Global Step: 121010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:10:02,012-Speed 3038.34 samples/sec   Loss 7.4822   LearningRate 0.0263   Epoch: 9   Global Step: 121020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:10:05,364-Speed 3056.53 samples/sec   Loss 7.4325   LearningRate 0.0263   Epoch: 9   Global Step: 121030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:10:08,772-Speed 3005.25 samples/sec   Loss 7.5354   LearningRate 0.0263   Epoch: 9   Global Step: 121040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:10:12,119-Speed 3060.00 samples/sec   Loss 7.4410   LearningRate 0.0263   Epoch: 9   Global Step: 121050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:10:15,470-Speed 3056.67 samples/sec   Loss 7.5258   LearningRate 0.0263   Epoch: 9   Global Step: 121060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:10:18,814-Speed 3062.99 samples/sec   Loss 7.5896   LearningRate 0.0263   Epoch: 9   Global Step: 121070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:10:22,241-Speed 2989.42 samples/sec   Loss 7.5575   LearningRate 0.0263   Epoch: 9   Global Step: 121080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:10:25,550-Speed 3095.65 samples/sec   Loss 7.5060   LearningRate 0.0263   Epoch: 9   Global Step: 121090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:10:28,954-Speed 3008.60 samples/sec   Loss 7.4529   LearningRate 0.0263   Epoch: 9   Global Step: 121100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:10:32,399-Speed 2973.63 samples/sec   Loss 7.5414   LearningRate 0.0263   Epoch: 9   Global Step: 121110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:10:35,785-Speed 3024.48 samples/sec   Loss 7.5746   LearningRate 0.0263   Epoch: 9   Global Step: 121120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:10:39,150-Speed 3043.72 samples/sec   Loss 7.3921   LearningRate 0.0263   Epoch: 9   Global Step: 121130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:10:42,495-Speed 3062.37 samples/sec   Loss 7.4266   LearningRate 0.0263   Epoch: 9   Global Step: 121140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:10:45,898-Speed 3009.80 samples/sec   Loss 7.5072   LearningRate 0.0262   Epoch: 9   Global Step: 121150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:10:49,214-Speed 3088.83 samples/sec   Loss 7.4269   LearningRate 0.0262   Epoch: 9   Global Step: 121160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:10:52,620-Speed 3008.13 samples/sec   Loss 7.4368   LearningRate 0.0262   Epoch: 9   Global Step: 121170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:10:55,973-Speed 3054.60 samples/sec   Loss 7.4867   LearningRate 0.0262   Epoch: 9   Global Step: 121180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:10:59,358-Speed 3026.12 samples/sec   Loss 7.4305   LearningRate 0.0262   Epoch: 9   Global Step: 121190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:11:02,761-Speed 3009.41 samples/sec   Loss 7.5175   LearningRate 0.0262   Epoch: 9   Global Step: 121200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:11:06,165-Speed 3009.12 samples/sec   Loss 7.4522   LearningRate 0.0262   Epoch: 9   Global Step: 121210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:11:09,535-Speed 3039.60 samples/sec   Loss 7.5364   LearningRate 0.0262   Epoch: 9   Global Step: 121220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:11:13,557-Speed 2546.70 samples/sec   Loss 7.3703   LearningRate 0.0262   Epoch: 9   Global Step: 121230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:11:16,927-Speed 3039.87 samples/sec   Loss 7.3743   LearningRate 0.0262   Epoch: 9   Global Step: 121240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:11:20,352-Speed 2990.95 samples/sec   Loss 7.5928   LearningRate 0.0262   Epoch: 9   Global Step: 121250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:11:24,391-Speed 2535.23 samples/sec   Loss 7.5997   LearningRate 0.0262   Epoch: 9   Global Step: 121260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:11:27,826-Speed 2982.48 samples/sec   Loss 7.4407   LearningRate 0.0262   Epoch: 9   Global Step: 121270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:11:32,349-Speed 2264.49 samples/sec   Loss 7.5550   LearningRate 0.0262   Epoch: 9   Global Step: 121280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:11:35,797-Speed 2970.29 samples/sec   Loss 7.4438   LearningRate 0.0262   Epoch: 9   Global Step: 121290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:11:39,234-Speed 2980.37 samples/sec   Loss 7.5111   LearningRate 0.0262   Epoch: 9   Global Step: 121300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:11:42,619-Speed 3025.57 samples/sec   Loss 7.5153   LearningRate 0.0262   Epoch: 9   Global Step: 121310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:11:46,037-Speed 2996.73 samples/sec   Loss 7.4734   LearningRate 0.0262   Epoch: 9   Global Step: 121320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:11:49,439-Speed 3010.78 samples/sec   Loss 7.4208   LearningRate 0.0262   Epoch: 9   Global Step: 121330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:11:52,808-Speed 3040.33 samples/sec   Loss 7.6382   LearningRate 0.0262   Epoch: 9   Global Step: 121340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:11:56,194-Speed 3025.53 samples/sec   Loss 7.3132   LearningRate 0.0262   Epoch: 9   Global Step: 121350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:11:59,608-Speed 2999.87 samples/sec   Loss 7.5179   LearningRate 0.0262   Epoch: 9   Global Step: 121360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:02,939-Speed 3075.46 samples/sec   Loss 7.5468   LearningRate 0.0262   Epoch: 9   Global Step: 121370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:06,276-Speed 3068.96 samples/sec   Loss 7.5599   LearningRate 0.0262   Epoch: 9   Global Step: 121380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:09,657-Speed 3029.92 samples/sec   Loss 7.4458   LearningRate 0.0261   Epoch: 9   Global Step: 121390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:13,004-Speed 3059.90 samples/sec   Loss 7.5253   LearningRate 0.0261   Epoch: 9   Global Step: 121400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:16,348-Speed 3062.57 samples/sec   Loss 7.5246   LearningRate 0.0261   Epoch: 9   Global Step: 121410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:19,845-Speed 2929.81 samples/sec   Loss 7.4687   LearningRate 0.0261   Epoch: 9   Global Step: 121420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:23,217-Speed 3037.73 samples/sec   Loss 7.4656   LearningRate 0.0261   Epoch: 9   Global Step: 121430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:26,587-Speed 3038.72 samples/sec   Loss 7.4055   LearningRate 0.0261   Epoch: 9   Global Step: 121440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:30,032-Speed 2973.25 samples/sec   Loss 7.5834   LearningRate 0.0261   Epoch: 9   Global Step: 121450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:33,404-Speed 3037.86 samples/sec   Loss 7.4779   LearningRate 0.0261   Epoch: 9   Global Step: 121460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:12:36,780-Speed 3033.97 samples/sec   Loss 7.4383   LearningRate 0.0261   Epoch: 9   Global Step: 121470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:12:40,153-Speed 3037.08 samples/sec   Loss 7.3726   LearningRate 0.0261   Epoch: 9   Global Step: 121480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:43,504-Speed 3056.05 samples/sec   Loss 7.4617   LearningRate 0.0261   Epoch: 9   Global Step: 121490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:46,829-Speed 3080.90 samples/sec   Loss 7.4557   LearningRate 0.0261   Epoch: 9   Global Step: 121500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:50,249-Speed 2995.14 samples/sec   Loss 7.3604   LearningRate 0.0261   Epoch: 9   Global Step: 121510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:53,664-Speed 2999.50 samples/sec   Loss 7.4350   LearningRate 0.0261   Epoch: 9   Global Step: 121520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:12:57,117-Speed 2965.96 samples/sec   Loss 7.4075   LearningRate 0.0261   Epoch: 9   Global Step: 121530   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:13:00,539-Speed 2994.15 samples/sec   Loss 7.4147   LearningRate 0.0261   Epoch: 9   Global Step: 121540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:13:03,933-Speed 3017.66 samples/sec   Loss 7.4819   LearningRate 0.0261   Epoch: 9   Global Step: 121550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:13:07,384-Speed 2968.19 samples/sec   Loss 7.5615   LearningRate 0.0261   Epoch: 9   Global Step: 121560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:13:10,799-Speed 2998.98 samples/sec   Loss 7.4739   LearningRate 0.0261   Epoch: 9   Global Step: 121570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:13:14,200-Speed 3011.63 samples/sec   Loss 7.4187   LearningRate 0.0261   Epoch: 9   Global Step: 121580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:13:17,545-Speed 3063.16 samples/sec   Loss 7.5210   LearningRate 0.0261   Epoch: 9   Global Step: 121590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:13:20,969-Speed 2992.31 samples/sec   Loss 7.3439   LearningRate 0.0261   Epoch: 9   Global Step: 121600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:13:24,318-Speed 3058.17 samples/sec   Loss 7.3858   LearningRate 0.0261   Epoch: 9   Global Step: 121610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:13:28,302-Speed 2571.31 samples/sec   Loss 7.3497   LearningRate 0.0261   Epoch: 9   Global Step: 121620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:13:32,255-Speed 2590.84 samples/sec   Loss 7.4215   LearningRate 0.0260   Epoch: 9   Global Step: 121630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:13:35,704-Speed 2970.76 samples/sec   Loss 7.4929   LearningRate 0.0260   Epoch: 9   Global Step: 121640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:13:39,140-Speed 2980.57 samples/sec   Loss 7.4510   LearningRate 0.0260   Epoch: 9   Global Step: 121650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:13:42,527-Speed 3023.93 samples/sec   Loss 7.4196   LearningRate 0.0260   Epoch: 9   Global Step: 121660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:13:45,944-Speed 2997.79 samples/sec   Loss 7.4032   LearningRate 0.0260   Epoch: 9   Global Step: 121670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:13:49,362-Speed 2996.99 samples/sec   Loss 7.3949   LearningRate 0.0260   Epoch: 9   Global Step: 121680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:13:52,765-Speed 3010.11 samples/sec   Loss 7.4333   LearningRate 0.0260   Epoch: 9   Global Step: 121690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:13:56,119-Speed 3054.01 samples/sec   Loss 7.4038   LearningRate 0.0260   Epoch: 9   Global Step: 121700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:13:59,497-Speed 3032.06 samples/sec   Loss 7.4045   LearningRate 0.0260   Epoch: 9   Global Step: 121710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:14:02,892-Speed 3017.43 samples/sec   Loss 7.2876   LearningRate 0.0260   Epoch: 9   Global Step: 121720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:14:06,238-Speed 3060.85 samples/sec   Loss 7.2997   LearningRate 0.0260   Epoch: 9   Global Step: 121730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:14:09,694-Speed 2963.97 samples/sec   Loss 7.4372   LearningRate 0.0260   Epoch: 9   Global Step: 121740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:14:13,053-Speed 3049.19 samples/sec   Loss 7.4483   LearningRate 0.0260   Epoch: 9   Global Step: 121750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:14:16,474-Speed 2994.90 samples/sec   Loss 7.4274   LearningRate 0.0260   Epoch: 9   Global Step: 121760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:14:19,979-Speed 2922.08 samples/sec   Loss 7.5073   LearningRate 0.0260   Epoch: 9   Global Step: 121770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:14:23,331-Speed 3055.26 samples/sec   Loss 7.4457   LearningRate 0.0260   Epoch: 9   Global Step: 121780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:14:26,736-Speed 3008.79 samples/sec   Loss 7.4341   LearningRate 0.0260   Epoch: 9   Global Step: 121790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:14:30,167-Speed 2984.72 samples/sec   Loss 7.3115   LearningRate 0.0260   Epoch: 9   Global Step: 121800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:14:33,550-Speed 3028.37 samples/sec   Loss 7.4846   LearningRate 0.0260   Epoch: 9   Global Step: 121810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:14:36,988-Speed 2979.00 samples/sec   Loss 7.5209   LearningRate 0.0260   Epoch: 9   Global Step: 121820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:14:40,401-Speed 3001.55 samples/sec   Loss 7.4679   LearningRate 0.0260   Epoch: 9   Global Step: 121830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:14:43,855-Speed 2965.32 samples/sec   Loss 7.3676   LearningRate 0.0260   Epoch: 9   Global Step: 121840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:14:47,208-Speed 3054.88 samples/sec   Loss 7.4375   LearningRate 0.0260   Epoch: 9   Global Step: 121850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:14:50,564-Speed 3051.93 samples/sec   Loss 7.3536   LearningRate 0.0260   Epoch: 9   Global Step: 121860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:14:53,897-Speed 3073.70 samples/sec   Loss 7.4171   LearningRate 0.0260   Epoch: 9   Global Step: 121870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:14:57,230-Speed 3072.76 samples/sec   Loss 7.3700   LearningRate 0.0259   Epoch: 9   Global Step: 121880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:15:00,597-Speed 3042.40 samples/sec   Loss 7.3495   LearningRate 0.0259   Epoch: 9   Global Step: 121890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:15:03,980-Speed 3027.50 samples/sec   Loss 7.3664   LearningRate 0.0259   Epoch: 9   Global Step: 121900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:15:07,337-Speed 3051.79 samples/sec   Loss 7.4877   LearningRate 0.0259   Epoch: 9   Global Step: 121910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:15:10,766-Speed 2986.58 samples/sec   Loss 7.3548   LearningRate 0.0259   Epoch: 9   Global Step: 121920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:15:14,152-Speed 3025.17 samples/sec   Loss 7.4411   LearningRate 0.0259   Epoch: 9   Global Step: 121930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:15:17,500-Speed 3059.36 samples/sec   Loss 7.4487   LearningRate 0.0259   Epoch: 9   Global Step: 121940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:15:20,909-Speed 3005.44 samples/sec   Loss 7.5398   LearningRate 0.0259   Epoch: 9   Global Step: 121950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:15:24,256-Speed 3060.08 samples/sec   Loss 7.3629   LearningRate 0.0259   Epoch: 9   Global Step: 121960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:15:27,614-Speed 3050.79 samples/sec   Loss 7.3696   LearningRate 0.0259   Epoch: 9   Global Step: 121970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:15:30,940-Speed 3079.48 samples/sec   Loss 7.4552   LearningRate 0.0259   Epoch: 9   Global Step: 121980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:15:34,445-Speed 2922.55 samples/sec   Loss 7.5789   LearningRate 0.0259   Epoch: 9   Global Step: 121990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:15:37,898-Speed 2966.09 samples/sec   Loss 7.4263   LearningRate 0.0259   Epoch: 9   Global Step: 122000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:15:41,314-Speed 2998.90 samples/sec   Loss 7.5680   LearningRate 0.0259   Epoch: 9   Global Step: 122010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:15:44,701-Speed 3023.55 samples/sec   Loss 7.4563   LearningRate 0.0259   Epoch: 9   Global Step: 122020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:15:48,139-Speed 2979.05 samples/sec   Loss 7.3807   LearningRate 0.0259   Epoch: 9   Global Step: 122030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:15:51,584-Speed 2973.56 samples/sec   Loss 7.4159   LearningRate 0.0259   Epoch: 9   Global Step: 122040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:15:54,911-Speed 3078.65 samples/sec   Loss 7.3866   LearningRate 0.0259   Epoch: 9   Global Step: 122050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:15:58,253-Speed 3067.17 samples/sec   Loss 7.3736   LearningRate 0.0259   Epoch: 9   Global Step: 122060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:16:01,607-Speed 3054.27 samples/sec   Loss 7.3361   LearningRate 0.0259   Epoch: 9   Global Step: 122070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:16:05,015-Speed 3005.87 samples/sec   Loss 7.5054   LearningRate 0.0259   Epoch: 9   Global Step: 122080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:16:08,444-Speed 2987.16 samples/sec   Loss 7.3960   LearningRate 0.0259   Epoch: 9   Global Step: 122090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:16:11,811-Speed 3042.13 samples/sec   Loss 7.4745   LearningRate 0.0259   Epoch: 9   Global Step: 122100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:16:15,151-Speed 3067.32 samples/sec   Loss 7.4531   LearningRate 0.0259   Epoch: 9   Global Step: 122110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:16:18,471-Speed 3085.28 samples/sec   Loss 7.4288   LearningRate 0.0258   Epoch: 9   Global Step: 122120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:16:21,831-Speed 3048.05 samples/sec   Loss 7.3546   LearningRate 0.0258   Epoch: 9   Global Step: 122130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:16:25,207-Speed 3034.26 samples/sec   Loss 7.4562   LearningRate 0.0258   Epoch: 9   Global Step: 122140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:16:28,643-Speed 2980.77 samples/sec   Loss 7.3256   LearningRate 0.0258   Epoch: 9   Global Step: 122150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:16:32,057-Speed 3000.39 samples/sec   Loss 7.4595   LearningRate 0.0258   Epoch: 9   Global Step: 122160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:16:35,410-Speed 3055.72 samples/sec   Loss 7.4002   LearningRate 0.0258   Epoch: 9   Global Step: 122170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:16:38,862-Speed 2966.68 samples/sec   Loss 7.3448   LearningRate 0.0258   Epoch: 9   Global Step: 122180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:16:42,289-Speed 2988.67 samples/sec   Loss 7.4952   LearningRate 0.0258   Epoch: 9   Global Step: 122190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:16:45,676-Speed 3023.94 samples/sec   Loss 7.3411   LearningRate 0.0258   Epoch: 9   Global Step: 122200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:16:49,079-Speed 3010.10 samples/sec   Loss 7.3972   LearningRate 0.0258   Epoch: 9   Global Step: 122210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:16:52,471-Speed 3020.25 samples/sec   Loss 7.4438   LearningRate 0.0258   Epoch: 9   Global Step: 122220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:16:55,795-Speed 3080.86 samples/sec   Loss 7.3761   LearningRate 0.0258   Epoch: 9   Global Step: 122230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:16:59,151-Speed 3052.16 samples/sec   Loss 7.4000   LearningRate 0.0258   Epoch: 9   Global Step: 122240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:02,572-Speed 2994.88 samples/sec   Loss 7.4318   LearningRate 0.0258   Epoch: 9   Global Step: 122250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:17:06,115-Speed 2890.43 samples/sec   Loss 7.3413   LearningRate 0.0258   Epoch: 9   Global Step: 122260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:17:09,439-Speed 3081.61 samples/sec   Loss 7.3860   LearningRate 0.0258   Epoch: 9   Global Step: 122270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:12,775-Speed 3070.31 samples/sec   Loss 7.4650   LearningRate 0.0258   Epoch: 9   Global Step: 122280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:16,177-Speed 3010.94 samples/sec   Loss 7.3578   LearningRate 0.0258   Epoch: 9   Global Step: 122290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:19,502-Speed 3080.54 samples/sec   Loss 7.4428   LearningRate 0.0258   Epoch: 9   Global Step: 122300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:22,826-Speed 3081.08 samples/sec   Loss 7.3219   LearningRate 0.0258   Epoch: 9   Global Step: 122310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:26,181-Speed 3052.95 samples/sec   Loss 7.4929   LearningRate 0.0258   Epoch: 9   Global Step: 122320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:29,603-Speed 2993.49 samples/sec   Loss 7.3116   LearningRate 0.0258   Epoch: 9   Global Step: 122330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:33,035-Speed 2984.95 samples/sec   Loss 7.3509   LearningRate 0.0258   Epoch: 9   Global Step: 122340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:36,397-Speed 3045.81 samples/sec   Loss 7.4763   LearningRate 0.0258   Epoch: 9   Global Step: 122350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:39,866-Speed 2952.99 samples/sec   Loss 7.5584   LearningRate 0.0258   Epoch: 9   Global Step: 122360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:43,331-Speed 2955.65 samples/sec   Loss 7.4081   LearningRate 0.0257   Epoch: 9   Global Step: 122370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:46,691-Speed 3048.28 samples/sec   Loss 7.4557   LearningRate 0.0257   Epoch: 9   Global Step: 122380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:50,075-Speed 3027.26 samples/sec   Loss 7.3921   LearningRate 0.0257   Epoch: 9   Global Step: 122390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:53,404-Speed 3076.45 samples/sec   Loss 7.4035   LearningRate 0.0257   Epoch: 9   Global Step: 122400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:17:56,824-Speed 2995.45 samples/sec   Loss 7.3856   LearningRate 0.0257   Epoch: 9   Global Step: 122410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:18:00,197-Speed 3036.29 samples/sec   Loss 7.3456   LearningRate 0.0257   Epoch: 9   Global Step: 122420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:18:03,561-Speed 3045.35 samples/sec   Loss 7.3120   LearningRate 0.0257   Epoch: 9   Global Step: 122430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:18:06,915-Speed 3053.57 samples/sec   Loss 7.2978   LearningRate 0.0257   Epoch: 9   Global Step: 122440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:18:10,248-Speed 3073.39 samples/sec   Loss 7.4067   LearningRate 0.0257   Epoch: 9   Global Step: 122450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:18:13,704-Speed 2963.61 samples/sec   Loss 7.3342   LearningRate 0.0257   Epoch: 9   Global Step: 122460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:18:17,158-Speed 2965.61 samples/sec   Loss 7.3176   LearningRate 0.0257   Epoch: 9   Global Step: 122470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:18:20,547-Speed 3021.84 samples/sec   Loss 7.4570   LearningRate 0.0257   Epoch: 9   Global Step: 122480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:18:24,003-Speed 2964.22 samples/sec   Loss 7.3492   LearningRate 0.0257   Epoch: 9   Global Step: 122490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:18:27,354-Speed 3056.44 samples/sec   Loss 7.3873   LearningRate 0.0257   Epoch: 9   Global Step: 122500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:18:30,701-Speed 3060.03 samples/sec   Loss 7.4861   LearningRate 0.0257   Epoch: 9   Global Step: 122510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:18:34,041-Speed 3067.24 samples/sec   Loss 7.3383   LearningRate 0.0257   Epoch: 9   Global Step: 122520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:18:37,385-Speed 3062.64 samples/sec   Loss 7.4270   LearningRate 0.0257   Epoch: 9   Global Step: 122530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:18:40,749-Speed 3044.74 samples/sec   Loss 7.3330   LearningRate 0.0257   Epoch: 9   Global Step: 122540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:18:44,213-Speed 2957.36 samples/sec   Loss 7.3430   LearningRate 0.0257   Epoch: 9   Global Step: 122550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:18:47,632-Speed 2996.12 samples/sec   Loss 7.3961   LearningRate 0.0257   Epoch: 9   Global Step: 122560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:18:51,046-Speed 3000.71 samples/sec   Loss 7.4701   LearningRate 0.0257   Epoch: 9   Global Step: 122570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:18:54,505-Speed 2960.70 samples/sec   Loss 7.4281   LearningRate 0.0257   Epoch: 9   Global Step: 122580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:18:57,885-Speed 3030.78 samples/sec   Loss 7.4466   LearningRate 0.0257   Epoch: 9   Global Step: 122590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:19:01,221-Speed 3070.56 samples/sec   Loss 7.2763   LearningRate 0.0257   Epoch: 9   Global Step: 122600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:19:04,670-Speed 2969.80 samples/sec   Loss 7.3475   LearningRate 0.0256   Epoch: 9   Global Step: 122610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:19:08,065-Speed 3017.06 samples/sec   Loss 7.3335   LearningRate 0.0256   Epoch: 9   Global Step: 122620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:19:11,428-Speed 3046.34 samples/sec   Loss 7.3631   LearningRate 0.0256   Epoch: 9   Global Step: 122630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:19:14,768-Speed 3066.53 samples/sec   Loss 7.4008   LearningRate 0.0256   Epoch: 9   Global Step: 122640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:19:18,075-Speed 3097.57 samples/sec   Loss 7.3791   LearningRate 0.0256   Epoch: 9   Global Step: 122650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:19:21,453-Speed 3031.99 samples/sec   Loss 7.3721   LearningRate 0.0256   Epoch: 9   Global Step: 122660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:19:24,823-Speed 3039.92 samples/sec   Loss 7.4172   LearningRate 0.0256   Epoch: 9   Global Step: 122670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:19:28,188-Speed 3043.55 samples/sec   Loss 7.2733   LearningRate 0.0256   Epoch: 9   Global Step: 122680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:19:31,559-Speed 3038.13 samples/sec   Loss 7.2225   LearningRate 0.0256   Epoch: 9   Global Step: 122690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:19:34,867-Speed 3096.52 samples/sec   Loss 7.5711   LearningRate 0.0256   Epoch: 9   Global Step: 122700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:19:38,199-Speed 3074.05 samples/sec   Loss 7.3219   LearningRate 0.0256   Epoch: 9   Global Step: 122710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:19:41,573-Speed 3035.99 samples/sec   Loss 7.3048   LearningRate 0.0256   Epoch: 9   Global Step: 122720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:19:44,901-Speed 3077.42 samples/sec   Loss 7.3425   LearningRate 0.0256   Epoch: 9   Global Step: 122730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:19:48,336-Speed 2982.11 samples/sec   Loss 7.3394   LearningRate 0.0256   Epoch: 9   Global Step: 122740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:19:51,666-Speed 3075.66 samples/sec   Loss 7.4059   LearningRate 0.0256   Epoch: 9   Global Step: 122750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:19:55,103-Speed 2980.44 samples/sec   Loss 7.3734   LearningRate 0.0256   Epoch: 9   Global Step: 122760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:19:58,513-Speed 3003.95 samples/sec   Loss 7.4657   LearningRate 0.0256   Epoch: 9   Global Step: 122770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:20:01,946-Speed 2983.91 samples/sec   Loss 7.3992   LearningRate 0.0256   Epoch: 9   Global Step: 122780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:20:05,349-Speed 3010.68 samples/sec   Loss 7.4426   LearningRate 0.0256   Epoch: 9   Global Step: 122790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:20:08,693-Speed 3062.88 samples/sec   Loss 7.3249   LearningRate 0.0256   Epoch: 9   Global Step: 122800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:20:12,083-Speed 3021.53 samples/sec   Loss 7.4599   LearningRate 0.0256   Epoch: 9   Global Step: 122810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:20:15,408-Speed 3079.66 samples/sec   Loss 7.3819   LearningRate 0.0256   Epoch: 9   Global Step: 122820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:20:18,784-Speed 3034.77 samples/sec   Loss 7.3528   LearningRate 0.0256   Epoch: 9   Global Step: 122830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:20:22,174-Speed 3020.94 samples/sec   Loss 7.3521   LearningRate 0.0256   Epoch: 9   Global Step: 122840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:20:25,484-Speed 3094.57 samples/sec   Loss 7.4899   LearningRate 0.0256   Epoch: 9   Global Step: 122850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:20:28,872-Speed 3023.43 samples/sec   Loss 7.3455   LearningRate 0.0255   Epoch: 9   Global Step: 122860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:20:32,217-Speed 3061.63 samples/sec   Loss 7.2860   LearningRate 0.0255   Epoch: 9   Global Step: 122870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:20:35,542-Speed 3080.94 samples/sec   Loss 7.3344   LearningRate 0.0255   Epoch: 9   Global Step: 122880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:20:38,999-Speed 2963.26 samples/sec   Loss 7.4333   LearningRate 0.0255   Epoch: 9   Global Step: 122890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:20:42,341-Speed 3064.84 samples/sec   Loss 7.4309   LearningRate 0.0255   Epoch: 9   Global Step: 122900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:20:45,719-Speed 3031.72 samples/sec   Loss 7.3093   LearningRate 0.0255   Epoch: 9   Global Step: 122910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:20:49,037-Speed 3087.70 samples/sec   Loss 7.3169   LearningRate 0.0255   Epoch: 9   Global Step: 122920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:20:52,414-Speed 3033.27 samples/sec   Loss 7.3923   LearningRate 0.0255   Epoch: 9   Global Step: 122930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:20:55,779-Speed 3042.93 samples/sec   Loss 7.5699   LearningRate 0.0255   Epoch: 9   Global Step: 122940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:20:59,122-Speed 3064.37 samples/sec   Loss 7.4403   LearningRate 0.0255   Epoch: 9   Global Step: 122950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:21:02,414-Speed 3111.88 samples/sec   Loss 7.3818   LearningRate 0.0255   Epoch: 9   Global Step: 122960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:05,759-Speed 3061.90 samples/sec   Loss 7.2818   LearningRate 0.0255   Epoch: 9   Global Step: 122970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:09,132-Speed 3036.79 samples/sec   Loss 7.3625   LearningRate 0.0255   Epoch: 9   Global Step: 122980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:12,573-Speed 2976.84 samples/sec   Loss 7.2928   LearningRate 0.0255   Epoch: 9   Global Step: 122990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:15,955-Speed 3028.26 samples/sec   Loss 7.3870   LearningRate 0.0255   Epoch: 9   Global Step: 123000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:19,368-Speed 3001.65 samples/sec   Loss 7.2922   LearningRate 0.0255   Epoch: 9   Global Step: 123010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:22,816-Speed 2970.32 samples/sec   Loss 7.3863   LearningRate 0.0255   Epoch: 9   Global Step: 123020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:26,183-Speed 3042.30 samples/sec   Loss 7.3237   LearningRate 0.0255   Epoch: 9   Global Step: 123030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:29,529-Speed 3060.75 samples/sec   Loss 7.4127   LearningRate 0.0255   Epoch: 9   Global Step: 123040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:32,924-Speed 3016.95 samples/sec   Loss 7.5166   LearningRate 0.0255   Epoch: 9   Global Step: 123050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:36,260-Speed 3070.43 samples/sec   Loss 7.3570   LearningRate 0.0255   Epoch: 9   Global Step: 123060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:21:39,627-Speed 3042.71 samples/sec   Loss 7.2054   LearningRate 0.0255   Epoch: 9   Global Step: 123070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:21:42,970-Speed 3063.90 samples/sec   Loss 7.3648   LearningRate 0.0255   Epoch: 9   Global Step: 123080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:46,318-Speed 3059.89 samples/sec   Loss 7.3860   LearningRate 0.0255   Epoch: 9   Global Step: 123090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:49,779-Speed 2959.23 samples/sec   Loss 7.4268   LearningRate 0.0254   Epoch: 9   Global Step: 123100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:53,194-Speed 2998.89 samples/sec   Loss 7.3007   LearningRate 0.0254   Epoch: 9   Global Step: 123110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:56,500-Speed 3098.74 samples/sec   Loss 7.2840   LearningRate 0.0254   Epoch: 9   Global Step: 123120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:21:59,884-Speed 3027.05 samples/sec   Loss 7.5576   LearningRate 0.0254   Epoch: 9   Global Step: 123130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:22:03,216-Speed 3074.07 samples/sec   Loss 7.3174   LearningRate 0.0254   Epoch: 9   Global Step: 123140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:22:06,567-Speed 3056.61 samples/sec   Loss 7.3198   LearningRate 0.0254   Epoch: 9   Global Step: 123150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:22:09,909-Speed 3064.04 samples/sec   Loss 7.2368   LearningRate 0.0254   Epoch: 9   Global Step: 123160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:22:13,309-Speed 3013.29 samples/sec   Loss 7.3079   LearningRate 0.0254   Epoch: 9   Global Step: 123170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:22:16,723-Speed 2999.94 samples/sec   Loss 7.1740   LearningRate 0.0254   Epoch: 9   Global Step: 123180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:22:20,087-Speed 3044.39 samples/sec   Loss 7.4638   LearningRate 0.0254   Epoch: 9   Global Step: 123190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:22:23,440-Speed 3054.88 samples/sec   Loss 7.3274   LearningRate 0.0254   Epoch: 9   Global Step: 123200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:22:26,799-Speed 3049.77 samples/sec   Loss 7.3911   LearningRate 0.0254   Epoch: 9   Global Step: 123210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:22:30,199-Speed 3012.89 samples/sec   Loss 7.3589   LearningRate 0.0254   Epoch: 9   Global Step: 123220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:22:33,579-Speed 3030.25 samples/sec   Loss 7.3704   LearningRate 0.0254   Epoch: 9   Global Step: 123230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:22:36,959-Speed 3030.66 samples/sec   Loss 7.5114   LearningRate 0.0254   Epoch: 9   Global Step: 123240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:22:40,381-Speed 2992.89 samples/sec   Loss 7.4002   LearningRate 0.0254   Epoch: 9   Global Step: 123250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:22:43,856-Speed 2947.89 samples/sec   Loss 7.4053   LearningRate 0.0254   Epoch: 9   Global Step: 123260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:22:47,308-Speed 2967.10 samples/sec   Loss 7.4464   LearningRate 0.0254   Epoch: 9   Global Step: 123270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:22:50,739-Speed 2985.39 samples/sec   Loss 7.3301   LearningRate 0.0254   Epoch: 9   Global Step: 123280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:22:54,080-Speed 3065.53 samples/sec   Loss 7.3332   LearningRate 0.0254   Epoch: 9   Global Step: 123290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:22:57,452-Speed 3037.76 samples/sec   Loss 7.2702   LearningRate 0.0254   Epoch: 9   Global Step: 123300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:23:00,818-Speed 3042.49 samples/sec   Loss 7.4130   LearningRate 0.0254   Epoch: 9   Global Step: 123310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:23:04,152-Speed 3072.19 samples/sec   Loss 7.3179   LearningRate 0.0254   Epoch: 9   Global Step: 123320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:23:07,506-Speed 3053.67 samples/sec   Loss 7.4339   LearningRate 0.0254   Epoch: 9   Global Step: 123330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:23:10,839-Speed 3073.55 samples/sec   Loss 7.3201   LearningRate 0.0254   Epoch: 9   Global Step: 123340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:23:14,190-Speed 3056.13 samples/sec   Loss 7.3162   LearningRate 0.0253   Epoch: 9   Global Step: 123350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:23:17,618-Speed 2988.23 samples/sec   Loss 7.3833   LearningRate 0.0253   Epoch: 9   Global Step: 123360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:23:21,030-Speed 3002.52 samples/sec   Loss 7.2543   LearningRate 0.0253   Epoch: 9   Global Step: 123370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:23:24,469-Speed 2978.43 samples/sec   Loss 7.3465   LearningRate 0.0253   Epoch: 9   Global Step: 123380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:23:27,860-Speed 3020.14 samples/sec   Loss 7.3594   LearningRate 0.0253   Epoch: 9   Global Step: 123390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:23:31,227-Speed 3042.62 samples/sec   Loss 7.2626   LearningRate 0.0253   Epoch: 9   Global Step: 123400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:23:34,599-Speed 3036.69 samples/sec   Loss 7.2742   LearningRate 0.0253   Epoch: 9   Global Step: 123410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:23:37,995-Speed 3016.56 samples/sec   Loss 7.3824   LearningRate 0.0253   Epoch: 9   Global Step: 123420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:23:41,426-Speed 2985.46 samples/sec   Loss 7.3186   LearningRate 0.0253   Epoch: 9   Global Step: 123430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:23:44,793-Speed 3042.37 samples/sec   Loss 7.4294   LearningRate 0.0253   Epoch: 9   Global Step: 123440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:23:48,155-Speed 3046.58 samples/sec   Loss 7.1149   LearningRate 0.0253   Epoch: 9   Global Step: 123450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:23:51,601-Speed 2971.93 samples/sec   Loss 7.1830   LearningRate 0.0253   Epoch: 9   Global Step: 123460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:23:55,064-Speed 2957.55 samples/sec   Loss 7.3452   LearningRate 0.0253   Epoch: 9   Global Step: 123470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:23:58,434-Speed 3039.65 samples/sec   Loss 7.3031   LearningRate 0.0253   Epoch: 9   Global Step: 123480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:24:01,816-Speed 3029.07 samples/sec   Loss 7.3669   LearningRate 0.0253   Epoch: 9   Global Step: 123490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:24:05,219-Speed 3009.92 samples/sec   Loss 7.3587   LearningRate 0.0253   Epoch: 9   Global Step: 123500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:24:08,566-Speed 3060.43 samples/sec   Loss 7.3324   LearningRate 0.0253   Epoch: 9   Global Step: 123510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:24:11,954-Speed 3022.84 samples/sec   Loss 7.3629   LearningRate 0.0253   Epoch: 9   Global Step: 123520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:24:15,305-Speed 3056.72 samples/sec   Loss 7.4549   LearningRate 0.0253   Epoch: 9   Global Step: 123530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:24:18,780-Speed 2948.17 samples/sec   Loss 7.3566   LearningRate 0.0253   Epoch: 9   Global Step: 123540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:24:22,115-Speed 3071.40 samples/sec   Loss 7.2437   LearningRate 0.0253   Epoch: 9   Global Step: 123550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:24:25,570-Speed 2964.30 samples/sec   Loss 7.1887   LearningRate 0.0253   Epoch: 9   Global Step: 123560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:24:29,006-Speed 2980.93 samples/sec   Loss 7.3371   LearningRate 0.0253   Epoch: 9   Global Step: 123570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:24:32,438-Speed 2984.39 samples/sec   Loss 7.1713   LearningRate 0.0253   Epoch: 9   Global Step: 123580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:24:35,965-Speed 2903.56 samples/sec   Loss 7.4069   LearningRate 0.0253   Epoch: 9   Global Step: 123590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:24:39,489-Speed 2906.96 samples/sec   Loss 7.4439   LearningRate 0.0252   Epoch: 9   Global Step: 123600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:24:42,876-Speed 3024.32 samples/sec   Loss 7.4714   LearningRate 0.0252   Epoch: 9   Global Step: 123610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:24:46,361-Speed 2938.47 samples/sec   Loss 7.2377   LearningRate 0.0252   Epoch: 9   Global Step: 123620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:24:49,711-Speed 3058.41 samples/sec   Loss 7.2622   LearningRate 0.0252   Epoch: 9   Global Step: 123630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:24:53,069-Speed 3050.11 samples/sec   Loss 7.3118   LearningRate 0.0252   Epoch: 9   Global Step: 123640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:24:56,431-Speed 3045.90 samples/sec   Loss 7.4258   LearningRate 0.0252   Epoch: 9   Global Step: 123650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:24:59,752-Speed 3085.28 samples/sec   Loss 7.2218   LearningRate 0.0252   Epoch: 9   Global Step: 123660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:25:03,112-Speed 3047.74 samples/sec   Loss 7.2986   LearningRate 0.0252   Epoch: 9   Global Step: 123670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:25:06,440-Speed 3078.17 samples/sec   Loss 7.3396   LearningRate 0.0252   Epoch: 9   Global Step: 123680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:25:09,785-Speed 3062.48 samples/sec   Loss 7.2223   LearningRate 0.0252   Epoch: 9   Global Step: 123690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:25:13,166-Speed 3028.97 samples/sec   Loss 7.1727   LearningRate 0.0252   Epoch: 9   Global Step: 123700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:25:16,581-Speed 2999.61 samples/sec   Loss 7.1815   LearningRate 0.0252   Epoch: 9   Global Step: 123710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:25:19,936-Speed 3052.73 samples/sec   Loss 7.3220   LearningRate 0.0252   Epoch: 9   Global Step: 123720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:25:23,285-Speed 3058.82 samples/sec   Loss 7.3545   LearningRate 0.0252   Epoch: 9   Global Step: 123730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:25:26,696-Speed 3003.02 samples/sec   Loss 7.2819   LearningRate 0.0252   Epoch: 9   Global Step: 123740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:25:30,099-Speed 3009.90 samples/sec   Loss 7.3165   LearningRate 0.0252   Epoch: 9   Global Step: 123750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:25:33,488-Speed 3021.93 samples/sec   Loss 7.3246   LearningRate 0.0252   Epoch: 9   Global Step: 123760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:25:36,869-Speed 3029.91 samples/sec   Loss 7.2888   LearningRate 0.0252   Epoch: 9   Global Step: 123770   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:25:40,286-Speed 2997.70 samples/sec   Loss 7.3207   LearningRate 0.0252   Epoch: 9   Global Step: 123780   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:25:43,693-Speed 3006.51 samples/sec   Loss 7.3604   LearningRate 0.0252   Epoch: 9   Global Step: 123790   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:25:47,101-Speed 3005.56 samples/sec   Loss 7.2287   LearningRate 0.0252   Epoch: 9   Global Step: 123800   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:25:50,506-Speed 3008.40 samples/sec   Loss 7.3708   LearningRate 0.0252   Epoch: 9   Global Step: 123810   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:25:53,836-Speed 3076.03 samples/sec   Loss 7.2428   LearningRate 0.0252   Epoch: 9   Global Step: 123820   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:25:57,186-Speed 3056.65 samples/sec   Loss 7.4626   LearningRate 0.0252   Epoch: 9   Global Step: 123830   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:26:00,560-Speed 3035.82 samples/sec   Loss 7.4288   LearningRate 0.0251   Epoch: 9   Global Step: 123840   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:26:03,895-Speed 3071.87 samples/sec   Loss 7.3675   LearningRate 0.0251   Epoch: 9   Global Step: 123850   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:26:07,290-Speed 3016.46 samples/sec   Loss 7.3168   LearningRate 0.0251   Epoch: 9   Global Step: 123860   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:26:10,715-Speed 2990.71 samples/sec   Loss 7.3129   LearningRate 0.0251   Epoch: 9   Global Step: 123870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:26:14,064-Speed 3058.59 samples/sec   Loss 7.3151   LearningRate 0.0251   Epoch: 9   Global Step: 123880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:26:17,485-Speed 2993.62 samples/sec   Loss 7.3599   LearningRate 0.0251   Epoch: 9   Global Step: 123890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:26:20,923-Speed 2979.28 samples/sec   Loss 7.3008   LearningRate 0.0251   Epoch: 9   Global Step: 123900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:26:24,323-Speed 3012.57 samples/sec   Loss 7.3195   LearningRate 0.0251   Epoch: 9   Global Step: 123910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:26:27,711-Speed 3024.11 samples/sec   Loss 7.3011   LearningRate 0.0251   Epoch: 9   Global Step: 123920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:26:31,100-Speed 3022.30 samples/sec   Loss 7.3026   LearningRate 0.0251   Epoch: 9   Global Step: 123930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:26:34,448-Speed 3059.49 samples/sec   Loss 7.4222   LearningRate 0.0251   Epoch: 9   Global Step: 123940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:26:37,795-Speed 3060.71 samples/sec   Loss 7.3608   LearningRate 0.0251   Epoch: 9   Global Step: 123950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:26:41,156-Speed 3047.35 samples/sec   Loss 7.2264   LearningRate 0.0251   Epoch: 9   Global Step: 123960   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:26:44,539-Speed 3028.05 samples/sec   Loss 7.3088   LearningRate 0.0251   Epoch: 9   Global Step: 123970   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:26:47,886-Speed 3060.27 samples/sec   Loss 7.3098   LearningRate 0.0251   Epoch: 9   Global Step: 123980   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:26:51,319-Speed 2983.29 samples/sec   Loss 7.2886   LearningRate 0.0251   Epoch: 9   Global Step: 123990   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:26:54,674-Speed 3053.18 samples/sec   Loss 7.2117   LearningRate 0.0251   Epoch: 9   Global Step: 124000   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:26:58,018-Speed 3062.66 samples/sec   Loss 7.3119   LearningRate 0.0251   Epoch: 9   Global Step: 124010   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:27:01,386-Speed 3041.13 samples/sec   Loss 7.1883   LearningRate 0.0251   Epoch: 9   Global Step: 124020   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:27:04,748-Speed 3046.89 samples/sec   Loss 7.3317   LearningRate 0.0251   Epoch: 9   Global Step: 124030   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:27:08,055-Speed 3097.76 samples/sec   Loss 7.2811   LearningRate 0.0251   Epoch: 9   Global Step: 124040   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:27:11,371-Speed 3088.80 samples/sec   Loss 7.3030   LearningRate 0.0251   Epoch: 9   Global Step: 124050   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:27:14,682-Speed 3093.60 samples/sec   Loss 7.3426   LearningRate 0.0251   Epoch: 9   Global Step: 124060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:27:18,069-Speed 3024.54 samples/sec   Loss 7.2543   LearningRate 0.0251   Epoch: 9   Global Step: 124070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:27:21,446-Speed 3032.82 samples/sec   Loss 7.2893   LearningRate 0.0251   Epoch: 9   Global Step: 124080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:27:24,753-Speed 3097.59 samples/sec   Loss 7.3779   LearningRate 0.0250   Epoch: 9   Global Step: 124090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:27:28,142-Speed 3021.84 samples/sec   Loss 7.2138   LearningRate 0.0250   Epoch: 9   Global Step: 124100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:27:31,593-Speed 2968.26 samples/sec   Loss 7.3504   LearningRate 0.0250   Epoch: 9   Global Step: 124110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:27:34,993-Speed 3012.65 samples/sec   Loss 7.2111   LearningRate 0.0250   Epoch: 9   Global Step: 124120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:27:38,345-Speed 3056.23 samples/sec   Loss 7.2713   LearningRate 0.0250   Epoch: 9   Global Step: 124130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:27:41,783-Speed 2979.00 samples/sec   Loss 7.3572   LearningRate 0.0250   Epoch: 9   Global Step: 124140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:27:45,158-Speed 3035.03 samples/sec   Loss 7.3368   LearningRate 0.0250   Epoch: 9   Global Step: 124150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:27:48,514-Speed 3052.37 samples/sec   Loss 7.2942   LearningRate 0.0250   Epoch: 9   Global Step: 124160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:27:51,948-Speed 2982.35 samples/sec   Loss 7.3068   LearningRate 0.0250   Epoch: 9   Global Step: 124170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:27:55,440-Speed 2933.67 samples/sec   Loss 7.2917   LearningRate 0.0250   Epoch: 9   Global Step: 124180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:27:58,799-Speed 3049.68 samples/sec   Loss 7.2394   LearningRate 0.0250   Epoch: 9   Global Step: 124190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:28:02,546-Speed 2733.53 samples/sec   Loss 7.3207   LearningRate 0.0250   Epoch: 9   Global Step: 124200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:28:05,995-Speed 2969.07 samples/sec   Loss 7.2968   LearningRate 0.0250   Epoch: 9   Global Step: 124210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:28:38,848-Speed 311.71 samples/sec   Loss 5.8874   LearningRate 0.0250   Epoch: 10   Global Step: 124220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:28:42,382-Speed 2899.57 samples/sec   Loss 5.7777   LearningRate 0.0250   Epoch: 10   Global Step: 124230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:28:45,774-Speed 3019.88 samples/sec   Loss 5.7393   LearningRate 0.0250   Epoch: 10   Global Step: 124240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:28:49,223-Speed 2969.93 samples/sec   Loss 5.8783   LearningRate 0.0250   Epoch: 10   Global Step: 124250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:28:52,601-Speed 3032.27 samples/sec   Loss 5.8722   LearningRate 0.0250   Epoch: 10   Global Step: 124260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:28:55,993-Speed 3019.37 samples/sec   Loss 5.7411   LearningRate 0.0250   Epoch: 10   Global Step: 124270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:28:59,333-Speed 3066.76 samples/sec   Loss 5.8612   LearningRate 0.0250   Epoch: 10   Global Step: 124280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:29:02,739-Speed 3008.10 samples/sec   Loss 5.8956   LearningRate 0.0250   Epoch: 10   Global Step: 124290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:29:06,140-Speed 3011.22 samples/sec   Loss 5.8281   LearningRate 0.0250   Epoch: 10   Global Step: 124300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:29:09,573-Speed 2984.69 samples/sec   Loss 5.8799   LearningRate 0.0250   Epoch: 10   Global Step: 124310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:29:12,971-Speed 3013.70 samples/sec   Loss 5.9513   LearningRate 0.0250   Epoch: 10   Global Step: 124320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:29:16,540-Speed 2869.99 samples/sec   Loss 5.9344   LearningRate 0.0250   Epoch: 10   Global Step: 124330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:29:19,938-Speed 3014.53 samples/sec   Loss 5.8734   LearningRate 0.0249   Epoch: 10   Global Step: 124340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:29:23,280-Speed 3064.65 samples/sec   Loss 5.8706   LearningRate 0.0249   Epoch: 10   Global Step: 124350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:29:26,746-Speed 2955.11 samples/sec   Loss 5.9110   LearningRate 0.0249   Epoch: 10   Global Step: 124360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:29:30,437-Speed 2775.58 samples/sec   Loss 5.7492   LearningRate 0.0249   Epoch: 10   Global Step: 124370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:29:33,803-Speed 3043.07 samples/sec   Loss 5.8033   LearningRate 0.0249   Epoch: 10   Global Step: 124380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:29:37,220-Speed 2997.58 samples/sec   Loss 5.9446   LearningRate 0.0249   Epoch: 10   Global Step: 124390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:29:40,784-Speed 2875.13 samples/sec   Loss 5.8064   LearningRate 0.0249   Epoch: 10   Global Step: 124400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:29:44,106-Speed 3084.16 samples/sec   Loss 5.8992   LearningRate 0.0249   Epoch: 10   Global Step: 124410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:29:47,463-Speed 3051.29 samples/sec   Loss 5.7588   LearningRate 0.0249   Epoch: 10   Global Step: 124420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:29:50,793-Speed 3075.57 samples/sec   Loss 5.9502   LearningRate 0.0249   Epoch: 10   Global Step: 124430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:29:54,125-Speed 3073.83 samples/sec   Loss 5.9151   LearningRate 0.0249   Epoch: 10   Global Step: 124440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:29:57,500-Speed 3035.35 samples/sec   Loss 5.9457   LearningRate 0.0249   Epoch: 10   Global Step: 124450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:30:00,902-Speed 3010.89 samples/sec   Loss 6.0079   LearningRate 0.0249   Epoch: 10   Global Step: 124460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:30:04,238-Speed 3070.65 samples/sec   Loss 5.9348   LearningRate 0.0249   Epoch: 10   Global Step: 124470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:30:07,634-Speed 3016.15 samples/sec   Loss 5.7925   LearningRate 0.0249   Epoch: 10   Global Step: 124480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:30:11,119-Speed 2939.24 samples/sec   Loss 5.9380   LearningRate 0.0249   Epoch: 10   Global Step: 124490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:30:14,496-Speed 3033.31 samples/sec   Loss 5.9191   LearningRate 0.0249   Epoch: 10   Global Step: 124500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:30:17,898-Speed 3010.80 samples/sec   Loss 6.0102   LearningRate 0.0249   Epoch: 10   Global Step: 124510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:30:21,261-Speed 3046.63 samples/sec   Loss 5.9246   LearningRate 0.0249   Epoch: 10   Global Step: 124520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:30:24,579-Speed 3087.19 samples/sec   Loss 5.8591   LearningRate 0.0249   Epoch: 10   Global Step: 124530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:30:28,003-Speed 2991.30 samples/sec   Loss 5.8342   LearningRate 0.0249   Epoch: 10   Global Step: 124540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:30:31,404-Speed 3012.81 samples/sec   Loss 6.0412   LearningRate 0.0249   Epoch: 10   Global Step: 124550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:30:34,815-Speed 3002.15 samples/sec   Loss 5.9893   LearningRate 0.0249   Epoch: 10   Global Step: 124560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:30:38,216-Speed 3012.37 samples/sec   Loss 5.9411   LearningRate 0.0249   Epoch: 10   Global Step: 124570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:30:41,564-Speed 3058.97 samples/sec   Loss 6.0171   LearningRate 0.0249   Epoch: 10   Global Step: 124580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:30:44,955-Speed 3020.62 samples/sec   Loss 6.0004   LearningRate 0.0248   Epoch: 10   Global Step: 124590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:30:48,333-Speed 3032.68 samples/sec   Loss 5.9817   LearningRate 0.0248   Epoch: 10   Global Step: 124600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:30:51,866-Speed 2899.26 samples/sec   Loss 5.8711   LearningRate 0.0248   Epoch: 10   Global Step: 124610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:30:55,308-Speed 2976.11 samples/sec   Loss 5.8815   LearningRate 0.0248   Epoch: 10   Global Step: 124620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:30:58,681-Speed 3036.95 samples/sec   Loss 5.8970   LearningRate 0.0248   Epoch: 10   Global Step: 124630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:02,035-Speed 3053.32 samples/sec   Loss 6.0216   LearningRate 0.0248   Epoch: 10   Global Step: 124640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:05,361-Speed 3079.52 samples/sec   Loss 5.9575   LearningRate 0.0248   Epoch: 10   Global Step: 124650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:08,840-Speed 2944.63 samples/sec   Loss 6.0967   LearningRate 0.0248   Epoch: 10   Global Step: 124660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:12,251-Speed 3003.33 samples/sec   Loss 5.9831   LearningRate 0.0248   Epoch: 10   Global Step: 124670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:15,658-Speed 3006.50 samples/sec   Loss 6.0232   LearningRate 0.0248   Epoch: 10   Global Step: 124680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:19,059-Speed 3011.81 samples/sec   Loss 5.9658   LearningRate 0.0248   Epoch: 10   Global Step: 124690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:22,495-Speed 2980.76 samples/sec   Loss 6.0858   LearningRate 0.0248   Epoch: 10   Global Step: 124700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:31:25,874-Speed 3031.97 samples/sec   Loss 5.9775   LearningRate 0.0248   Epoch: 10   Global Step: 124710   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:31:29,239-Speed 3043.58 samples/sec   Loss 5.9859   LearningRate 0.0248   Epoch: 10   Global Step: 124720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:31:32,593-Speed 3054.31 samples/sec   Loss 6.1135   LearningRate 0.0248   Epoch: 10   Global Step: 124730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:35,917-Speed 3080.99 samples/sec   Loss 5.9524   LearningRate 0.0248   Epoch: 10   Global Step: 124740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:39,272-Speed 3054.12 samples/sec   Loss 6.0654   LearningRate 0.0248   Epoch: 10   Global Step: 124750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:42,642-Speed 3040.22 samples/sec   Loss 6.1252   LearningRate 0.0248   Epoch: 10   Global Step: 124760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:46,006-Speed 3044.79 samples/sec   Loss 6.0978   LearningRate 0.0248   Epoch: 10   Global Step: 124770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:49,371-Speed 3043.90 samples/sec   Loss 6.0929   LearningRate 0.0248   Epoch: 10   Global Step: 124780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:52,844-Speed 2948.80 samples/sec   Loss 6.0671   LearningRate 0.0248   Epoch: 10   Global Step: 124790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:56,214-Speed 3039.79 samples/sec   Loss 6.0208   LearningRate 0.0248   Epoch: 10   Global Step: 124800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:31:59,578-Speed 3045.00 samples/sec   Loss 6.0429   LearningRate 0.0248   Epoch: 10   Global Step: 124810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:32:02,922-Speed 3062.68 samples/sec   Loss 5.9874   LearningRate 0.0248   Epoch: 10   Global Step: 124820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:32:06,329-Speed 3006.50 samples/sec   Loss 6.0470   LearningRate 0.0248   Epoch: 10   Global Step: 124830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:32:09,687-Speed 3050.56 samples/sec   Loss 5.9791   LearningRate 0.0247   Epoch: 10   Global Step: 124840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:32:13,089-Speed 3011.09 samples/sec   Loss 6.0646   LearningRate 0.0247   Epoch: 10   Global Step: 124850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:32:16,436-Speed 3060.32 samples/sec   Loss 6.0488   LearningRate 0.0247   Epoch: 10   Global Step: 124860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:32:19,841-Speed 3007.80 samples/sec   Loss 5.9191   LearningRate 0.0247   Epoch: 10   Global Step: 124870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:32:23,233-Speed 3019.90 samples/sec   Loss 6.0492   LearningRate 0.0247   Epoch: 10   Global Step: 124880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:32:26,682-Speed 2969.63 samples/sec   Loss 6.0458   LearningRate 0.0247   Epoch: 10   Global Step: 124890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:32:30,076-Speed 3018.63 samples/sec   Loss 6.1075   LearningRate 0.0247   Epoch: 10   Global Step: 124900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:32:33,432-Speed 3051.59 samples/sec   Loss 6.0945   LearningRate 0.0247   Epoch: 10   Global Step: 124910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:32:36,818-Speed 3025.67 samples/sec   Loss 6.0958   LearningRate 0.0247   Epoch: 10   Global Step: 124920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:32:40,172-Speed 3054.08 samples/sec   Loss 6.0867   LearningRate 0.0247   Epoch: 10   Global Step: 124930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:32:43,530-Speed 3050.45 samples/sec   Loss 6.1933   LearningRate 0.0247   Epoch: 10   Global Step: 124940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:32:46,903-Speed 3036.32 samples/sec   Loss 6.0523   LearningRate 0.0247   Epoch: 10   Global Step: 124950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:32:50,265-Speed 3046.40 samples/sec   Loss 6.0920   LearningRate 0.0247   Epoch: 10   Global Step: 124960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:32:53,726-Speed 2959.52 samples/sec   Loss 6.1776   LearningRate 0.0247   Epoch: 10   Global Step: 124970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:32:57,139-Speed 3002.24 samples/sec   Loss 6.0994   LearningRate 0.0247   Epoch: 10   Global Step: 124980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:33:00,571-Speed 2985.18 samples/sec   Loss 6.1097   LearningRate 0.0247   Epoch: 10   Global Step: 124990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:33:03,933-Speed 3046.65 samples/sec   Loss 6.2016   LearningRate 0.0247   Epoch: 10   Global Step: 125000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:33:07,316-Speed 3028.19 samples/sec   Loss 6.2573   LearningRate 0.0247   Epoch: 10   Global Step: 125010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:33:10,736-Speed 2995.57 samples/sec   Loss 6.1095   LearningRate 0.0247   Epoch: 10   Global Step: 125020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:33:14,168-Speed 2984.57 samples/sec   Loss 6.0694   LearningRate 0.0247   Epoch: 10   Global Step: 125030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:33:17,625-Speed 2962.87 samples/sec   Loss 6.2039   LearningRate 0.0247   Epoch: 10   Global Step: 125040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:33:20,963-Speed 3068.49 samples/sec   Loss 6.0642   LearningRate 0.0247   Epoch: 10   Global Step: 125050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:33:24,327-Speed 3044.66 samples/sec   Loss 6.1769   LearningRate 0.0247   Epoch: 10   Global Step: 125060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:33:27,736-Speed 3004.79 samples/sec   Loss 6.1745   LearningRate 0.0247   Epoch: 10   Global Step: 125070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:33:31,108-Speed 3038.15 samples/sec   Loss 6.0198   LearningRate 0.0247   Epoch: 10   Global Step: 125080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:33:34,435-Speed 3078.88 samples/sec   Loss 6.1075   LearningRate 0.0246   Epoch: 10   Global Step: 125090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:33:37,849-Speed 3000.39 samples/sec   Loss 6.1681   LearningRate 0.0246   Epoch: 10   Global Step: 125100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:33:41,181-Speed 3073.88 samples/sec   Loss 6.1760   LearningRate 0.0246   Epoch: 10   Global Step: 125110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:33:44,625-Speed 2974.56 samples/sec   Loss 6.1804   LearningRate 0.0246   Epoch: 10   Global Step: 125120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:33:48,068-Speed 2974.86 samples/sec   Loss 6.1729   LearningRate 0.0246   Epoch: 10   Global Step: 125130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:33:51,502-Speed 2983.05 samples/sec   Loss 6.1693   LearningRate 0.0246   Epoch: 10   Global Step: 125140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:33:54,995-Speed 2932.12 samples/sec   Loss 6.1406   LearningRate 0.0246   Epoch: 10   Global Step: 125150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:33:58,538-Speed 2891.69 samples/sec   Loss 6.1960   LearningRate 0.0246   Epoch: 10   Global Step: 125160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:01,916-Speed 3031.84 samples/sec   Loss 6.1558   LearningRate 0.0246   Epoch: 10   Global Step: 125170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:05,406-Speed 2934.72 samples/sec   Loss 6.1923   LearningRate 0.0246   Epoch: 10   Global Step: 125180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:08,840-Speed 2983.23 samples/sec   Loss 6.2315   LearningRate 0.0246   Epoch: 10   Global Step: 125190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:12,219-Speed 3031.23 samples/sec   Loss 6.1924   LearningRate 0.0246   Epoch: 10   Global Step: 125200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:15,603-Speed 3027.39 samples/sec   Loss 6.1800   LearningRate 0.0246   Epoch: 10   Global Step: 125210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:19,035-Speed 2983.62 samples/sec   Loss 6.1174   LearningRate 0.0246   Epoch: 10   Global Step: 125220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:34:22,493-Speed 2962.88 samples/sec   Loss 6.2243   LearningRate 0.0246   Epoch: 10   Global Step: 125230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:34:25,952-Speed 2961.12 samples/sec   Loss 6.0815   LearningRate 0.0246   Epoch: 10   Global Step: 125240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:34:29,307-Speed 3054.52 samples/sec   Loss 6.2996   LearningRate 0.0246   Epoch: 10   Global Step: 125250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:32,761-Speed 2964.96 samples/sec   Loss 6.1384   LearningRate 0.0246   Epoch: 10   Global Step: 125260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:36,157-Speed 3016.07 samples/sec   Loss 6.2461   LearningRate 0.0246   Epoch: 10   Global Step: 125270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:39,492-Speed 3071.94 samples/sec   Loss 6.2082   LearningRate 0.0246   Epoch: 10   Global Step: 125280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:42,877-Speed 3026.17 samples/sec   Loss 6.3640   LearningRate 0.0246   Epoch: 10   Global Step: 125290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:46,248-Speed 3037.89 samples/sec   Loss 6.3313   LearningRate 0.0246   Epoch: 10   Global Step: 125300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:49,705-Speed 2963.73 samples/sec   Loss 6.2012   LearningRate 0.0246   Epoch: 10   Global Step: 125310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:53,133-Speed 2987.62 samples/sec   Loss 6.2326   LearningRate 0.0246   Epoch: 10   Global Step: 125320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:56,552-Speed 2996.10 samples/sec   Loss 6.0802   LearningRate 0.0246   Epoch: 10   Global Step: 125330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:34:59,970-Speed 2996.56 samples/sec   Loss 6.1814   LearningRate 0.0245   Epoch: 10   Global Step: 125340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:35:03,385-Speed 2999.73 samples/sec   Loss 6.3097   LearningRate 0.0245   Epoch: 10   Global Step: 125350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:35:06,736-Speed 3056.24 samples/sec   Loss 6.1450   LearningRate 0.0245   Epoch: 10   Global Step: 125360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:35:10,130-Speed 3018.46 samples/sec   Loss 6.2153   LearningRate 0.0245   Epoch: 10   Global Step: 125370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:35:13,613-Speed 2941.28 samples/sec   Loss 6.3516   LearningRate 0.0245   Epoch: 10   Global Step: 125380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:35:17,025-Speed 3001.79 samples/sec   Loss 6.2831   LearningRate 0.0245   Epoch: 10   Global Step: 125390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:35:20,458-Speed 2983.61 samples/sec   Loss 6.2866   LearningRate 0.0245   Epoch: 10   Global Step: 125400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:35:23,817-Speed 3049.25 samples/sec   Loss 6.2193   LearningRate 0.0245   Epoch: 10   Global Step: 125410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:35:27,230-Speed 3001.48 samples/sec   Loss 6.2311   LearningRate 0.0245   Epoch: 10   Global Step: 125420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:35:30,615-Speed 3026.50 samples/sec   Loss 6.1877   LearningRate 0.0245   Epoch: 10   Global Step: 125430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:35:33,967-Speed 3055.89 samples/sec   Loss 6.2729   LearningRate 0.0245   Epoch: 10   Global Step: 125440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:35:37,379-Speed 3002.67 samples/sec   Loss 6.3658   LearningRate 0.0245   Epoch: 10   Global Step: 125450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:35:40,832-Speed 2966.62 samples/sec   Loss 6.2781   LearningRate 0.0245   Epoch: 10   Global Step: 125460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:35:44,221-Speed 3022.53 samples/sec   Loss 6.3134   LearningRate 0.0245   Epoch: 10   Global Step: 125470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:35:47,613-Speed 3019.96 samples/sec   Loss 6.2909   LearningRate 0.0245   Epoch: 10   Global Step: 125480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:35:51,092-Speed 2944.35 samples/sec   Loss 6.2581   LearningRate 0.0245   Epoch: 10   Global Step: 125490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:35:54,495-Speed 3009.55 samples/sec   Loss 6.3540   LearningRate 0.0245   Epoch: 10   Global Step: 125500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:35:57,850-Speed 3053.11 samples/sec   Loss 6.2584   LearningRate 0.0245   Epoch: 10   Global Step: 125510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:36:01,230-Speed 3031.04 samples/sec   Loss 6.3550   LearningRate 0.0245   Epoch: 10   Global Step: 125520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:36:04,694-Speed 2956.37 samples/sec   Loss 6.3952   LearningRate 0.0245   Epoch: 10   Global Step: 125530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:36:08,030-Speed 3071.15 samples/sec   Loss 6.3489   LearningRate 0.0245   Epoch: 10   Global Step: 125540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:36:11,412-Speed 3028.35 samples/sec   Loss 6.2560   LearningRate 0.0245   Epoch: 10   Global Step: 125550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:36:14,844-Speed 2984.28 samples/sec   Loss 6.3241   LearningRate 0.0245   Epoch: 10   Global Step: 125560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:36:18,265-Speed 2994.47 samples/sec   Loss 6.2676   LearningRate 0.0245   Epoch: 10   Global Step: 125570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:36:21,683-Speed 2996.48 samples/sec   Loss 6.2978   LearningRate 0.0245   Epoch: 10   Global Step: 125580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:36:25,048-Speed 3044.33 samples/sec   Loss 6.2493   LearningRate 0.0244   Epoch: 10   Global Step: 125590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:36:28,402-Speed 3054.33 samples/sec   Loss 6.2874   LearningRate 0.0244   Epoch: 10   Global Step: 125600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:36:31,757-Speed 3052.34 samples/sec   Loss 6.3756   LearningRate 0.0244   Epoch: 10   Global Step: 125610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:36:35,174-Speed 2997.53 samples/sec   Loss 6.2594   LearningRate 0.0244   Epoch: 10   Global Step: 125620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:36:38,554-Speed 3031.21 samples/sec   Loss 6.3522   LearningRate 0.0244   Epoch: 10   Global Step: 125630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:36:41,893-Speed 3067.54 samples/sec   Loss 6.3094   LearningRate 0.0244   Epoch: 10   Global Step: 125640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:36:45,309-Speed 2997.81 samples/sec   Loss 6.3958   LearningRate 0.0244   Epoch: 10   Global Step: 125650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:36:48,659-Speed 3058.56 samples/sec   Loss 6.4275   LearningRate 0.0244   Epoch: 10   Global Step: 125660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:36:52,033-Speed 3035.87 samples/sec   Loss 6.2436   LearningRate 0.0244   Epoch: 10   Global Step: 125670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:36:55,438-Speed 3008.36 samples/sec   Loss 6.3731   LearningRate 0.0244   Epoch: 10   Global Step: 125680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:36:58,825-Speed 3023.84 samples/sec   Loss 6.3721   LearningRate 0.0244   Epoch: 10   Global Step: 125690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:02,158-Speed 3073.23 samples/sec   Loss 6.3447   LearningRate 0.0244   Epoch: 10   Global Step: 125700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:05,579-Speed 2993.81 samples/sec   Loss 6.3056   LearningRate 0.0244   Epoch: 10   Global Step: 125710   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:08,995-Speed 2998.85 samples/sec   Loss 6.4091   LearningRate 0.0244   Epoch: 10   Global Step: 125720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:12,343-Speed 3059.88 samples/sec   Loss 6.3936   LearningRate 0.0244   Epoch: 10   Global Step: 125730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:15,776-Speed 2982.67 samples/sec   Loss 6.3175   LearningRate 0.0244   Epoch: 10   Global Step: 125740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:19,217-Speed 2976.98 samples/sec   Loss 6.3973   LearningRate 0.0244   Epoch: 10   Global Step: 125750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:22,625-Speed 3005.93 samples/sec   Loss 6.4568   LearningRate 0.0244   Epoch: 10   Global Step: 125760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 13:37:25,997-Speed 3037.25 samples/sec   Loss 6.3497   LearningRate 0.0244   Epoch: 10   Global Step: 125770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:29,377-Speed 3030.67 samples/sec   Loss 6.4748   LearningRate 0.0244   Epoch: 10   Global Step: 125780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:32,789-Speed 3001.79 samples/sec   Loss 6.3802   LearningRate 0.0244   Epoch: 10   Global Step: 125790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:36,127-Speed 3068.38 samples/sec   Loss 6.3005   LearningRate 0.0244   Epoch: 10   Global Step: 125800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:39,505-Speed 3033.44 samples/sec   Loss 6.4440   LearningRate 0.0244   Epoch: 10   Global Step: 125810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:42,845-Speed 3066.34 samples/sec   Loss 6.4619   LearningRate 0.0244   Epoch: 10   Global Step: 125820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:46,170-Speed 3080.32 samples/sec   Loss 6.4154   LearningRate 0.0244   Epoch: 10   Global Step: 125830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:49,568-Speed 3015.07 samples/sec   Loss 6.5545   LearningRate 0.0243   Epoch: 10   Global Step: 125840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:53,034-Speed 2955.63 samples/sec   Loss 6.4110   LearningRate 0.0243   Epoch: 10   Global Step: 125850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:37:56,355-Speed 3083.97 samples/sec   Loss 6.3326   LearningRate 0.0243   Epoch: 10   Global Step: 125860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:37:59,732-Speed 3033.28 samples/sec   Loss 6.3975   LearningRate 0.0243   Epoch: 10   Global Step: 125870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:38:03,079-Speed 3060.97 samples/sec   Loss 6.3931   LearningRate 0.0243   Epoch: 10   Global Step: 125880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:38:06,424-Speed 3061.95 samples/sec   Loss 6.4033   LearningRate 0.0243   Epoch: 10   Global Step: 125890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:38:09,735-Speed 3093.57 samples/sec   Loss 6.4396   LearningRate 0.0243   Epoch: 10   Global Step: 125900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:38:13,075-Speed 3067.04 samples/sec   Loss 6.4191   LearningRate 0.0243   Epoch: 10   Global Step: 125910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:38:16,397-Speed 3083.32 samples/sec   Loss 6.4125   LearningRate 0.0243   Epoch: 10   Global Step: 125920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:38:19,776-Speed 3031.56 samples/sec   Loss 6.4984   LearningRate 0.0243   Epoch: 10   Global Step: 125930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:38:23,131-Speed 3052.44 samples/sec   Loss 6.3679   LearningRate 0.0243   Epoch: 10   Global Step: 125940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:38:26,425-Speed 3109.94 samples/sec   Loss 6.4735   LearningRate 0.0243   Epoch: 10   Global Step: 125950   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:38:29,827-Speed 3010.48 samples/sec   Loss 6.3641   LearningRate 0.0243   Epoch: 10   Global Step: 125960   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:38:33,141-Speed 3090.93 samples/sec   Loss 6.4499   LearningRate 0.0243   Epoch: 10   Global Step: 125970   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:38:36,448-Speed 3097.35 samples/sec   Loss 6.3175   LearningRate 0.0243   Epoch: 10   Global Step: 125980   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:38:39,796-Speed 3059.99 samples/sec   Loss 6.4741   LearningRate 0.0243   Epoch: 10   Global Step: 125990   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:38:43,142-Speed 3060.70 samples/sec   Loss 6.4282   LearningRate 0.0243   Epoch: 10   Global Step: 126000   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:38:46,544-Speed 3010.37 samples/sec   Loss 6.3851   LearningRate 0.0243   Epoch: 10   Global Step: 126010   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:38:49,878-Speed 3072.85 samples/sec   Loss 6.4208   LearningRate 0.0243   Epoch: 10   Global Step: 126020   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:38:53,244-Speed 3043.03 samples/sec   Loss 6.3919   LearningRate 0.0243   Epoch: 10   Global Step: 126030   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:38:56,546-Speed 3101.64 samples/sec   Loss 6.4096   LearningRate 0.0243   Epoch: 10   Global Step: 126040   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:38:59,913-Speed 3042.53 samples/sec   Loss 6.4717   LearningRate 0.0243   Epoch: 10   Global Step: 126050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:03,229-Speed 3089.78 samples/sec   Loss 6.5060   LearningRate 0.0243   Epoch: 10   Global Step: 126060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:06,598-Speed 3039.52 samples/sec   Loss 6.3286   LearningRate 0.0243   Epoch: 10   Global Step: 126070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:09,941-Speed 3064.56 samples/sec   Loss 6.4110   LearningRate 0.0243   Epoch: 10   Global Step: 126080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:13,316-Speed 3034.86 samples/sec   Loss 6.4446   LearningRate 0.0242   Epoch: 10   Global Step: 126090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:16,669-Speed 3055.34 samples/sec   Loss 6.4652   LearningRate 0.0242   Epoch: 10   Global Step: 126100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:20,097-Speed 2988.08 samples/sec   Loss 6.4384   LearningRate 0.0242   Epoch: 10   Global Step: 126110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:23,478-Speed 3029.38 samples/sec   Loss 6.4504   LearningRate 0.0242   Epoch: 10   Global Step: 126120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:26,872-Speed 3018.65 samples/sec   Loss 6.4889   LearningRate 0.0242   Epoch: 10   Global Step: 126130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:30,330-Speed 2962.33 samples/sec   Loss 6.4236   LearningRate 0.0242   Epoch: 10   Global Step: 126140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:33,780-Speed 2969.00 samples/sec   Loss 6.4950   LearningRate 0.0242   Epoch: 10   Global Step: 126150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:39:37,221-Speed 2976.12 samples/sec   Loss 6.4549   LearningRate 0.0242   Epoch: 10   Global Step: 126160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 13:39:40,626-Speed 3008.97 samples/sec   Loss 6.3724   LearningRate 0.0242   Epoch: 10   Global Step: 126170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:44,065-Speed 2978.00 samples/sec   Loss 6.4465   LearningRate 0.0242   Epoch: 10   Global Step: 126180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:47,482-Speed 3000.73 samples/sec   Loss 6.5340   LearningRate 0.0242   Epoch: 10   Global Step: 126190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:50,830-Speed 3059.68 samples/sec   Loss 6.5067   LearningRate 0.0242   Epoch: 10   Global Step: 126200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:54,246-Speed 2998.00 samples/sec   Loss 6.4610   LearningRate 0.0242   Epoch: 10   Global Step: 126210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:39:57,604-Speed 3050.80 samples/sec   Loss 6.4372   LearningRate 0.0242   Epoch: 10   Global Step: 126220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:40:00,960-Speed 3052.20 samples/sec   Loss 6.4716   LearningRate 0.0242   Epoch: 10   Global Step: 126230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:40:04,330-Speed 3038.50 samples/sec   Loss 6.5408   LearningRate 0.0242   Epoch: 10   Global Step: 126240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:40:07,661-Speed 3075.56 samples/sec   Loss 6.4752   LearningRate 0.0242   Epoch: 10   Global Step: 126250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:40:11,061-Speed 3012.58 samples/sec   Loss 6.5862   LearningRate 0.0242   Epoch: 10   Global Step: 126260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:40:14,473-Speed 3001.87 samples/sec   Loss 6.3982   LearningRate 0.0242   Epoch: 10   Global Step: 126270   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:40:17,813-Speed 3066.78 samples/sec   Loss 6.4353   LearningRate 0.0242   Epoch: 10   Global Step: 126280   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:40:21,195-Speed 3029.53 samples/sec   Loss 6.6039   LearningRate 0.0242   Epoch: 10   Global Step: 126290   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:40:24,556-Speed 3047.35 samples/sec   Loss 6.4683   LearningRate 0.0242   Epoch: 10   Global Step: 126300   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:40:27,948-Speed 3019.85 samples/sec   Loss 6.4381   LearningRate 0.0242   Epoch: 10   Global Step: 126310   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:40:31,328-Speed 3030.73 samples/sec   Loss 6.4479   LearningRate 0.0242   Epoch: 10   Global Step: 126320   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:40:34,712-Speed 3026.90 samples/sec   Loss 6.6580   LearningRate 0.0242   Epoch: 10   Global Step: 126330   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:40:38,059-Speed 3060.18 samples/sec   Loss 6.4765   LearningRate 0.0241   Epoch: 10   Global Step: 126340   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:40:41,450-Speed 3020.83 samples/sec   Loss 6.4192   LearningRate 0.0241   Epoch: 10   Global Step: 126350   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:40:44,763-Speed 3091.58 samples/sec   Loss 6.6319   LearningRate 0.0241   Epoch: 10   Global Step: 126360   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 13:40:48,139-Speed 3034.11 samples/sec   Loss 6.4668   LearningRate 0.0241   Epoch: 10   Global Step: 126370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:40:51,499-Speed 3048.39 samples/sec   Loss 6.5178   LearningRate 0.0241   Epoch: 10   Global Step: 126380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:40:54,862-Speed 3046.13 samples/sec   Loss 6.5669   LearningRate 0.0241   Epoch: 10   Global Step: 126390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:40:58,217-Speed 3052.75 samples/sec   Loss 6.5568   LearningRate 0.0241   Epoch: 10   Global Step: 126400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:41:01,606-Speed 3023.17 samples/sec   Loss 6.5423   LearningRate 0.0241   Epoch: 10   Global Step: 126410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:41:04,963-Speed 3050.70 samples/sec   Loss 6.5376   LearningRate 0.0241   Epoch: 10   Global Step: 126420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:41:08,370-Speed 3006.26 samples/sec   Loss 6.5064   LearningRate 0.0241   Epoch: 10   Global Step: 126430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 13:41:11,798-Speed 2988.72 samples/sec   Loss 6.6052   LearningRate 0.0241   Epoch: 10   Global Step: 126440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:41:15,274-Speed 2945.77 samples/sec   Loss 6.4939   LearningRate 0.0241   Epoch: 10   Global Step: 126450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:41:18,731-Speed 2963.84 samples/sec   Loss 6.5456   LearningRate 0.0241   Epoch: 10   Global Step: 126460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:41:22,136-Speed 3008.22 samples/sec   Loss 6.5709   LearningRate 0.0241   Epoch: 10   Global Step: 126470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:41:25,525-Speed 3022.42 samples/sec   Loss 6.5668   LearningRate 0.0241   Epoch: 10   Global Step: 126480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:41:29,000-Speed 2947.36 samples/sec   Loss 6.6277   LearningRate 0.0241   Epoch: 10   Global Step: 126490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:41:32,342-Speed 3064.76 samples/sec   Loss 6.4183   LearningRate 0.0241   Epoch: 10   Global Step: 126500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:41:35,695-Speed 3055.75 samples/sec   Loss 6.5807   LearningRate 0.0241   Epoch: 10   Global Step: 126510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:41:39,057-Speed 3046.58 samples/sec   Loss 6.6678   LearningRate 0.0241   Epoch: 10   Global Step: 126520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:41:42,472-Speed 2999.19 samples/sec   Loss 6.5791   LearningRate 0.0241   Epoch: 10   Global Step: 126530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:41:45,790-Speed 3086.88 samples/sec   Loss 6.5730   LearningRate 0.0241   Epoch: 10   Global Step: 126540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:41:49,196-Speed 3007.11 samples/sec   Loss 6.4837   LearningRate 0.0241   Epoch: 10   Global Step: 126550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:41:52,555-Speed 3049.62 samples/sec   Loss 6.5307   LearningRate 0.0241   Epoch: 10   Global Step: 126560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:41:55,902-Speed 3061.10 samples/sec   Loss 6.5437   LearningRate 0.0241   Epoch: 10   Global Step: 126570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 13:41:59,249-Speed 3060.02 samples/sec   Loss 6.6032   LearningRate 0.0241   Epoch: 10   Global Step: 126580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:42:02,580-Speed 3075.45 samples/sec   Loss 6.5684   LearningRate 0.0241   Epoch: 10   Global Step: 126590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:42:05,911-Speed 3075.78 samples/sec   Loss 6.5412   LearningRate 0.0240   Epoch: 10   Global Step: 126600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:42:09,419-Speed 2919.20 samples/sec   Loss 6.6004   LearningRate 0.0240   Epoch: 10   Global Step: 126610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:42:12,781-Speed 3047.87 samples/sec   Loss 6.5816   LearningRate 0.0240   Epoch: 10   Global Step: 126620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:42:16,124-Speed 3063.81 samples/sec   Loss 6.6217   LearningRate 0.0240   Epoch: 10   Global Step: 126630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:42:19,617-Speed 2932.95 samples/sec   Loss 6.4294   LearningRate 0.0240   Epoch: 10   Global Step: 126640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:42:22,942-Speed 3080.50 samples/sec   Loss 6.5965   LearningRate 0.0240   Epoch: 10   Global Step: 126650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:42:26,365-Speed 2992.67 samples/sec   Loss 6.5677   LearningRate 0.0240   Epoch: 10   Global Step: 126660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:42:29,729-Speed 3045.15 samples/sec   Loss 6.5721   LearningRate 0.0240   Epoch: 10   Global Step: 126670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:42:33,063-Speed 3072.66 samples/sec   Loss 6.4920   LearningRate 0.0240   Epoch: 10   Global Step: 126680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:42:36,448-Speed 3025.16 samples/sec   Loss 6.6268   LearningRate 0.0240   Epoch: 10   Global Step: 126690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:42:39,840-Speed 3020.05 samples/sec   Loss 6.6050   LearningRate 0.0240   Epoch: 10   Global Step: 126700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:42:43,264-Speed 2992.11 samples/sec   Loss 6.6251   LearningRate 0.0240   Epoch: 10   Global Step: 126710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:42:46,698-Speed 2982.67 samples/sec   Loss 6.4315   LearningRate 0.0240   Epoch: 10   Global Step: 126720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:42:50,044-Speed 3061.73 samples/sec   Loss 6.6307   LearningRate 0.0240   Epoch: 10   Global Step: 126730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:42:53,439-Speed 3017.06 samples/sec   Loss 6.5870   LearningRate 0.0240   Epoch: 10   Global Step: 126740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:42:56,874-Speed 2981.46 samples/sec   Loss 6.5051   LearningRate 0.0240   Epoch: 10   Global Step: 126750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:43:00,315-Speed 2976.72 samples/sec   Loss 6.5730   LearningRate 0.0240   Epoch: 10   Global Step: 126760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:43:03,755-Speed 2978.08 samples/sec   Loss 6.6559   LearningRate 0.0240   Epoch: 10   Global Step: 126770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:43:07,165-Speed 3003.68 samples/sec   Loss 6.6543   LearningRate 0.0240   Epoch: 10   Global Step: 126780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:43:10,683-Speed 2911.16 samples/sec   Loss 6.5950   LearningRate 0.0240   Epoch: 10   Global Step: 126790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:43:14,130-Speed 2972.08 samples/sec   Loss 6.5930   LearningRate 0.0240   Epoch: 10   Global Step: 126800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:43:17,553-Speed 2991.69 samples/sec   Loss 6.7334   LearningRate 0.0240   Epoch: 10   Global Step: 126810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:43:20,907-Speed 3053.93 samples/sec   Loss 6.5302   LearningRate 0.0240   Epoch: 10   Global Step: 126820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:43:24,278-Speed 3038.79 samples/sec   Loss 6.6522   LearningRate 0.0240   Epoch: 10   Global Step: 126830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 13:43:27,622-Speed 3063.65 samples/sec   Loss 6.5998   LearningRate 0.0240   Epoch: 10   Global Step: 126840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 13:43:31,039-Speed 2997.77 samples/sec   Loss 6.5327   LearningRate 0.0239   Epoch: 10   Global Step: 126850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 13:43:34,471-Speed 2984.30 samples/sec   Loss 6.6503   LearningRate 0.0239   Epoch: 10   Global Step: 126860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 13:43:37,894-Speed 2992.81 samples/sec   Loss 6.6288   LearningRate 0.0239   Epoch: 10   Global Step: 126870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:43:41,227-Speed 3073.34 samples/sec   Loss 6.5731   LearningRate 0.0239   Epoch: 10   Global Step: 126880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:43:44,570-Speed 3064.53 samples/sec   Loss 6.6108   LearningRate 0.0239   Epoch: 10   Global Step: 126890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:43:47,945-Speed 3034.91 samples/sec   Loss 6.5334   LearningRate 0.0239   Epoch: 10   Global Step: 126900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:43:51,441-Speed 2930.12 samples/sec   Loss 6.6256   LearningRate 0.0239   Epoch: 10   Global Step: 126910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:43:54,865-Speed 2991.61 samples/sec   Loss 6.6706   LearningRate 0.0239   Epoch: 10   Global Step: 126920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:43:58,360-Speed 2930.63 samples/sec   Loss 6.6726   LearningRate 0.0239   Epoch: 10   Global Step: 126930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:44:01,801-Speed 2976.82 samples/sec   Loss 6.5860   LearningRate 0.0239   Epoch: 10   Global Step: 126940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:44:05,183-Speed 3028.81 samples/sec   Loss 6.6217   LearningRate 0.0239   Epoch: 10   Global Step: 126950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:44:08,622-Speed 2978.25 samples/sec   Loss 6.6440   LearningRate 0.0239   Epoch: 10   Global Step: 126960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:44:12,055-Speed 2983.59 samples/sec   Loss 6.6083   LearningRate 0.0239   Epoch: 10   Global Step: 126970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:44:15,456-Speed 3012.12 samples/sec   Loss 6.6211   LearningRate 0.0239   Epoch: 10   Global Step: 126980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:44:18,857-Speed 3011.94 samples/sec   Loss 6.6406   LearningRate 0.0239   Epoch: 10   Global Step: 126990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:44:22,285-Speed 2987.37 samples/sec   Loss 6.6422   LearningRate 0.0239   Epoch: 10   Global Step: 127000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:44:25,655-Speed 3039.62 samples/sec   Loss 6.7185   LearningRate 0.0239   Epoch: 10   Global Step: 127010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:44:29,075-Speed 2995.49 samples/sec   Loss 6.6729   LearningRate 0.0239   Epoch: 10   Global Step: 127020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:44:32,507-Speed 2983.70 samples/sec   Loss 6.7529   LearningRate 0.0239   Epoch: 10   Global Step: 127030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:44:35,860-Speed 3054.77 samples/sec   Loss 6.7620   LearningRate 0.0239   Epoch: 10   Global Step: 127040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:44:39,251-Speed 3021.36 samples/sec   Loss 6.7446   LearningRate 0.0239   Epoch: 10   Global Step: 127050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:44:42,667-Speed 2998.58 samples/sec   Loss 6.7133   LearningRate 0.0239   Epoch: 10   Global Step: 127060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:44:46,156-Speed 2935.39 samples/sec   Loss 6.6538   LearningRate 0.0239   Epoch: 10   Global Step: 127070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 13:44:49,522-Speed 3043.22 samples/sec   Loss 6.6232   LearningRate 0.0239   Epoch: 10   Global Step: 127080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:44:52,865-Speed 3063.94 samples/sec   Loss 6.7325   LearningRate 0.0239   Epoch: 10   Global Step: 127090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:44:56,332-Speed 2954.31 samples/sec   Loss 6.7782   LearningRate 0.0239   Epoch: 10   Global Step: 127100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:44:59,730-Speed 3014.37 samples/sec   Loss 6.7065   LearningRate 0.0238   Epoch: 10   Global Step: 127110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:03,205-Speed 2948.20 samples/sec   Loss 6.6192   LearningRate 0.0238   Epoch: 10   Global Step: 127120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:06,596-Speed 3020.27 samples/sec   Loss 6.6021   LearningRate 0.0238   Epoch: 10   Global Step: 127130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:09,997-Speed 3011.77 samples/sec   Loss 6.7369   LearningRate 0.0238   Epoch: 10   Global Step: 127140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:13,468-Speed 2950.96 samples/sec   Loss 6.5920   LearningRate 0.0238   Epoch: 10   Global Step: 127150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:16,838-Speed 3039.94 samples/sec   Loss 6.7874   LearningRate 0.0238   Epoch: 10   Global Step: 127160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:20,197-Speed 3049.83 samples/sec   Loss 6.7061   LearningRate 0.0238   Epoch: 10   Global Step: 127170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:23,686-Speed 2935.57 samples/sec   Loss 6.5805   LearningRate 0.0238   Epoch: 10   Global Step: 127180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 13:45:27,086-Speed 3012.64 samples/sec   Loss 6.6558   LearningRate 0.0238   Epoch: 10   Global Step: 127190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 13:45:30,430-Speed 3063.01 samples/sec   Loss 6.7513   LearningRate 0.0238   Epoch: 10   Global Step: 127200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:33,883-Speed 2966.97 samples/sec   Loss 6.6749   LearningRate 0.0238   Epoch: 10   Global Step: 127210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:37,275-Speed 3019.96 samples/sec   Loss 6.6565   LearningRate 0.0238   Epoch: 10   Global Step: 127220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:40,629-Speed 3053.90 samples/sec   Loss 6.6621   LearningRate 0.0238   Epoch: 10   Global Step: 127230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:44,030-Speed 3011.17 samples/sec   Loss 6.7945   LearningRate 0.0238   Epoch: 10   Global Step: 127240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:47,382-Speed 3055.73 samples/sec   Loss 6.5860   LearningRate 0.0238   Epoch: 10   Global Step: 127250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:50,782-Speed 3013.07 samples/sec   Loss 6.7285   LearningRate 0.0238   Epoch: 10   Global Step: 127260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:54,176-Speed 3017.86 samples/sec   Loss 6.6718   LearningRate 0.0238   Epoch: 10   Global Step: 127270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:45:57,564-Speed 3023.20 samples/sec   Loss 6.6557   LearningRate 0.0238   Epoch: 10   Global Step: 127280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:46:00,888-Speed 3082.07 samples/sec   Loss 6.7167   LearningRate 0.0238   Epoch: 10   Global Step: 127290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:46:04,258-Speed 3039.42 samples/sec   Loss 6.6355   LearningRate 0.0238   Epoch: 10   Global Step: 127300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 13:46:07,589-Speed 3074.80 samples/sec   Loss 6.6695   LearningRate 0.0238   Epoch: 10   Global Step: 127310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:46:10,961-Speed 3037.52 samples/sec   Loss 6.6651   LearningRate 0.0238   Epoch: 10   Global Step: 127320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:46:14,305-Speed 3063.60 samples/sec   Loss 6.8105   LearningRate 0.0238   Epoch: 10   Global Step: 127330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:46:17,684-Speed 3031.17 samples/sec   Loss 6.8253   LearningRate 0.0238   Epoch: 10   Global Step: 127340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:46:21,020-Speed 3070.10 samples/sec   Loss 6.8753   LearningRate 0.0238   Epoch: 10   Global Step: 127350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:46:24,381-Speed 3047.85 samples/sec   Loss 6.6991   LearningRate 0.0237   Epoch: 10   Global Step: 127360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:46:27,718-Speed 3069.70 samples/sec   Loss 6.7485   LearningRate 0.0237   Epoch: 10   Global Step: 127370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:46:31,054-Speed 3070.05 samples/sec   Loss 6.6324   LearningRate 0.0237   Epoch: 10   Global Step: 127380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:46:34,396-Speed 3065.99 samples/sec   Loss 6.7974   LearningRate 0.0237   Epoch: 10   Global Step: 127390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:46:37,840-Speed 2973.42 samples/sec   Loss 6.8166   LearningRate 0.0237   Epoch: 10   Global Step: 127400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:46:41,185-Speed 3061.83 samples/sec   Loss 6.6368   LearningRate 0.0237   Epoch: 10   Global Step: 127410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:46:44,540-Speed 3053.42 samples/sec   Loss 6.6286   LearningRate 0.0237   Epoch: 10   Global Step: 127420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:46:47,932-Speed 3019.97 samples/sec   Loss 6.7430   LearningRate 0.0237   Epoch: 10   Global Step: 127430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:46:51,275-Speed 3064.08 samples/sec   Loss 6.7051   LearningRate 0.0237   Epoch: 10   Global Step: 127440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:46:54,622-Speed 3060.48 samples/sec   Loss 6.7735   LearningRate 0.0237   Epoch: 10   Global Step: 127450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:46:58,012-Speed 3021.88 samples/sec   Loss 6.8179   LearningRate 0.0237   Epoch: 10   Global Step: 127460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:47:01,418-Speed 3007.66 samples/sec   Loss 6.7189   LearningRate 0.0237   Epoch: 10   Global Step: 127470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:47:04,799-Speed 3029.90 samples/sec   Loss 6.7108   LearningRate 0.0237   Epoch: 10   Global Step: 127480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:47:08,151-Speed 3055.70 samples/sec   Loss 6.6849   LearningRate 0.0237   Epoch: 10   Global Step: 127490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:47:11,601-Speed 2968.26 samples/sec   Loss 6.4461   LearningRate 0.0237   Epoch: 10   Global Step: 127500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:47:15,007-Speed 3007.80 samples/sec   Loss 6.7124   LearningRate 0.0237   Epoch: 10   Global Step: 127510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:47:18,335-Speed 3078.51 samples/sec   Loss 6.6616   LearningRate 0.0237   Epoch: 10   Global Step: 127520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:47:21,697-Speed 3046.36 samples/sec   Loss 6.6771   LearningRate 0.0237   Epoch: 10   Global Step: 127530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:47:25,116-Speed 2995.99 samples/sec   Loss 6.8909   LearningRate 0.0237   Epoch: 10   Global Step: 127540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:47:28,500-Speed 3026.77 samples/sec   Loss 6.6828   LearningRate 0.0237   Epoch: 10   Global Step: 127550   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:47:31,912-Speed 3002.23 samples/sec   Loss 6.7735   LearningRate 0.0237   Epoch: 10   Global Step: 127560   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:47:35,249-Speed 3069.16 samples/sec   Loss 6.6929   LearningRate 0.0237   Epoch: 10   Global Step: 127570   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:47:38,597-Speed 3059.44 samples/sec   Loss 6.6958   LearningRate 0.0237   Epoch: 10   Global Step: 127580   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:47:42,040-Speed 2975.08 samples/sec   Loss 6.7292   LearningRate 0.0237   Epoch: 10   Global Step: 127590   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:47:45,471-Speed 2985.36 samples/sec   Loss 6.7470   LearningRate 0.0237   Epoch: 10   Global Step: 127600   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:47:48,949-Speed 2945.64 samples/sec   Loss 6.6902   LearningRate 0.0237   Epoch: 10   Global Step: 127610   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:47:52,349-Speed 3012.48 samples/sec   Loss 6.8693   LearningRate 0.0236   Epoch: 10   Global Step: 127620   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:47:55,761-Speed 3002.36 samples/sec   Loss 6.7202   LearningRate 0.0236   Epoch: 10   Global Step: 127630   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:47:59,225-Speed 2956.88 samples/sec   Loss 6.8137   LearningRate 0.0236   Epoch: 10   Global Step: 127640   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:48:02,761-Speed 2896.88 samples/sec   Loss 6.7930   LearningRate 0.0236   Epoch: 10   Global Step: 127650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:48:06,227-Speed 2955.35 samples/sec   Loss 6.8168   LearningRate 0.0236   Epoch: 10   Global Step: 127660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:48:09,604-Speed 3032.56 samples/sec   Loss 6.7778   LearningRate 0.0236   Epoch: 10   Global Step: 127670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:48:13,022-Speed 2997.14 samples/sec   Loss 6.7971   LearningRate 0.0236   Epoch: 10   Global Step: 127680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:48:16,441-Speed 2996.47 samples/sec   Loss 6.7495   LearningRate 0.0236   Epoch: 10   Global Step: 127690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:48:19,845-Speed 3009.67 samples/sec   Loss 6.7199   LearningRate 0.0236   Epoch: 10   Global Step: 127700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:48:23,270-Speed 2990.05 samples/sec   Loss 6.7567   LearningRate 0.0236   Epoch: 10   Global Step: 127710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:48:26,715-Speed 2973.12 samples/sec   Loss 6.7710   LearningRate 0.0236   Epoch: 10   Global Step: 127720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:48:30,180-Speed 2956.26 samples/sec   Loss 6.7569   LearningRate 0.0236   Epoch: 10   Global Step: 127730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:48:33,649-Speed 2952.82 samples/sec   Loss 6.7253   LearningRate 0.0236   Epoch: 10   Global Step: 127740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:48:37,012-Speed 3045.44 samples/sec   Loss 6.7741   LearningRate 0.0236   Epoch: 10   Global Step: 127750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:48:40,443-Speed 2986.11 samples/sec   Loss 6.7234   LearningRate 0.0236   Epoch: 10   Global Step: 127760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:48:43,802-Speed 3049.28 samples/sec   Loss 6.7740   LearningRate 0.0236   Epoch: 10   Global Step: 127770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:48:47,174-Speed 3036.78 samples/sec   Loss 6.7623   LearningRate 0.0236   Epoch: 10   Global Step: 127780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:48:50,573-Speed 3013.57 samples/sec   Loss 6.7893   LearningRate 0.0236   Epoch: 10   Global Step: 127790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:48:54,060-Speed 2937.98 samples/sec   Loss 6.8502   LearningRate 0.0236   Epoch: 10   Global Step: 127800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:48:57,422-Speed 3046.83 samples/sec   Loss 6.8182   LearningRate 0.0236   Epoch: 10   Global Step: 127810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:00,821-Speed 3014.12 samples/sec   Loss 6.7535   LearningRate 0.0236   Epoch: 10   Global Step: 127820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:04,267-Speed 2971.80 samples/sec   Loss 6.8717   LearningRate 0.0236   Epoch: 10   Global Step: 127830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:07,654-Speed 3023.92 samples/sec   Loss 6.7252   LearningRate 0.0236   Epoch: 10   Global Step: 127840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:11,070-Speed 2999.09 samples/sec   Loss 6.7770   LearningRate 0.0236   Epoch: 10   Global Step: 127850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:14,451-Speed 3029.66 samples/sec   Loss 6.7028   LearningRate 0.0236   Epoch: 10   Global Step: 127860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:17,893-Speed 2976.01 samples/sec   Loss 6.6952   LearningRate 0.0235   Epoch: 10   Global Step: 127870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:21,280-Speed 3024.45 samples/sec   Loss 6.8163   LearningRate 0.0235   Epoch: 10   Global Step: 127880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:24,707-Speed 2988.74 samples/sec   Loss 6.9531   LearningRate 0.0235   Epoch: 10   Global Step: 127890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:28,091-Speed 3026.84 samples/sec   Loss 6.7527   LearningRate 0.0235   Epoch: 10   Global Step: 127900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:31,558-Speed 2954.02 samples/sec   Loss 6.7075   LearningRate 0.0235   Epoch: 10   Global Step: 127910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:49:34,941-Speed 3027.90 samples/sec   Loss 6.7910   LearningRate 0.0235   Epoch: 10   Global Step: 127920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:49:38,317-Speed 3034.41 samples/sec   Loss 6.8160   LearningRate 0.0235   Epoch: 10   Global Step: 127930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:49:41,671-Speed 3053.27 samples/sec   Loss 6.7467   LearningRate 0.0235   Epoch: 10   Global Step: 127940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:45,154-Speed 2940.95 samples/sec   Loss 6.9076   LearningRate 0.0235   Epoch: 10   Global Step: 127950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:48,653-Speed 2927.75 samples/sec   Loss 6.7608   LearningRate 0.0235   Epoch: 10   Global Step: 127960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:52,027-Speed 3035.86 samples/sec   Loss 6.7906   LearningRate 0.0235   Epoch: 10   Global Step: 127970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:55,504-Speed 2945.79 samples/sec   Loss 6.7781   LearningRate 0.0235   Epoch: 10   Global Step: 127980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:49:58,895-Speed 3021.24 samples/sec   Loss 6.8608   LearningRate 0.0235   Epoch: 10   Global Step: 127990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:50:02,347-Speed 2967.42 samples/sec   Loss 6.8390   LearningRate 0.0235   Epoch: 10   Global Step: 128000   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:50:05,717-Speed 3039.13 samples/sec   Loss 6.8900   LearningRate 0.0235   Epoch: 10   Global Step: 128010   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:50:09,109-Speed 3019.89 samples/sec   Loss 6.9161   LearningRate 0.0235   Epoch: 10   Global Step: 128020   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:50:12,514-Speed 3008.28 samples/sec   Loss 6.8139   LearningRate 0.0235   Epoch: 10   Global Step: 128030   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:50:15,852-Speed 3067.80 samples/sec   Loss 6.7379   LearningRate 0.0235   Epoch: 10   Global Step: 128040   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:50:19,305-Speed 2967.08 samples/sec   Loss 6.7810   LearningRate 0.0235   Epoch: 10   Global Step: 128050   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:50:22,660-Speed 3052.89 samples/sec   Loss 6.9030   LearningRate 0.0235   Epoch: 10   Global Step: 128060   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:50:26,052-Speed 3019.94 samples/sec   Loss 6.7678   LearningRate 0.0235   Epoch: 10   Global Step: 128070   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:50:29,412-Speed 3049.07 samples/sec   Loss 6.8278   LearningRate 0.0235   Epoch: 10   Global Step: 128080   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:50:32,720-Speed 3096.69 samples/sec   Loss 6.8569   LearningRate 0.0235   Epoch: 10   Global Step: 128090   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:50:36,153-Speed 2982.84 samples/sec   Loss 6.7706   LearningRate 0.0235   Epoch: 10   Global Step: 128100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:50:39,514-Speed 3047.89 samples/sec   Loss 6.7828   LearningRate 0.0235   Epoch: 10   Global Step: 128110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:50:42,817-Speed 3101.91 samples/sec   Loss 6.8905   LearningRate 0.0235   Epoch: 10   Global Step: 128120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:50:46,128-Speed 3093.25 samples/sec   Loss 6.7958   LearningRate 0.0234   Epoch: 10   Global Step: 128130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:50:49,443-Speed 3090.00 samples/sec   Loss 6.8114   LearningRate 0.0234   Epoch: 10   Global Step: 128140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:50:52,789-Speed 3060.75 samples/sec   Loss 6.7669   LearningRate 0.0234   Epoch: 10   Global Step: 128150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:50:56,149-Speed 3049.01 samples/sec   Loss 6.8469   LearningRate 0.0234   Epoch: 10   Global Step: 128160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:50:59,526-Speed 3033.15 samples/sec   Loss 6.7666   LearningRate 0.0234   Epoch: 10   Global Step: 128170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:51:02,892-Speed 3042.76 samples/sec   Loss 6.8537   LearningRate 0.0234   Epoch: 10   Global Step: 128180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:51:06,311-Speed 2996.32 samples/sec   Loss 6.7936   LearningRate 0.0234   Epoch: 10   Global Step: 128190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:51:09,743-Speed 2984.68 samples/sec   Loss 6.7874   LearningRate 0.0234   Epoch: 10   Global Step: 128200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:51:13,100-Speed 3050.44 samples/sec   Loss 6.8008   LearningRate 0.0234   Epoch: 10   Global Step: 128210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:51:16,452-Speed 3056.22 samples/sec   Loss 6.8944   LearningRate 0.0234   Epoch: 10   Global Step: 128220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:51:19,798-Speed 3061.39 samples/sec   Loss 6.9059   LearningRate 0.0234   Epoch: 10   Global Step: 128230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:51:23,179-Speed 3029.01 samples/sec   Loss 6.8353   LearningRate 0.0234   Epoch: 10   Global Step: 128240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:51:26,486-Speed 3097.55 samples/sec   Loss 6.8659   LearningRate 0.0234   Epoch: 10   Global Step: 128250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:51:29,947-Speed 2959.65 samples/sec   Loss 6.9914   LearningRate 0.0234   Epoch: 10   Global Step: 128260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:51:33,420-Speed 2949.58 samples/sec   Loss 6.8156   LearningRate 0.0234   Epoch: 10   Global Step: 128270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:51:36,829-Speed 3004.65 samples/sec   Loss 6.7319   LearningRate 0.0234   Epoch: 10   Global Step: 128280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:51:40,224-Speed 3017.39 samples/sec   Loss 6.9097   LearningRate 0.0234   Epoch: 10   Global Step: 128290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:51:43,553-Speed 3076.67 samples/sec   Loss 6.8184   LearningRate 0.0234   Epoch: 10   Global Step: 128300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:51:46,923-Speed 3039.00 samples/sec   Loss 6.8909   LearningRate 0.0234   Epoch: 10   Global Step: 128310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:51:50,261-Speed 3068.96 samples/sec   Loss 6.8450   LearningRate 0.0234   Epoch: 10   Global Step: 128320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:51:53,731-Speed 2952.23 samples/sec   Loss 6.9126   LearningRate 0.0234   Epoch: 10   Global Step: 128330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:51:57,131-Speed 3012.56 samples/sec   Loss 6.9310   LearningRate 0.0234   Epoch: 10   Global Step: 128340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:52:00,522-Speed 3020.55 samples/sec   Loss 6.9385   LearningRate 0.0234   Epoch: 10   Global Step: 128350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:52:03,867-Speed 3062.63 samples/sec   Loss 6.7788   LearningRate 0.0234   Epoch: 10   Global Step: 128360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:52:07,315-Speed 2970.79 samples/sec   Loss 6.9249   LearningRate 0.0234   Epoch: 10   Global Step: 128370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:52:10,662-Speed 3060.38 samples/sec   Loss 6.9016   LearningRate 0.0233   Epoch: 10   Global Step: 128380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:52:14,105-Speed 2975.18 samples/sec   Loss 6.8308   LearningRate 0.0233   Epoch: 10   Global Step: 128390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:52:17,499-Speed 3018.73 samples/sec   Loss 6.8667   LearningRate 0.0233   Epoch: 10   Global Step: 128400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:52:20,887-Speed 3023.11 samples/sec   Loss 6.7497   LearningRate 0.0233   Epoch: 10   Global Step: 128410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:52:24,262-Speed 3035.27 samples/sec   Loss 6.9269   LearningRate 0.0233   Epoch: 10   Global Step: 128420   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:52:27,640-Speed 3032.81 samples/sec   Loss 6.9395   LearningRate 0.0233   Epoch: 10   Global Step: 128430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:52:31,020-Speed 3030.27 samples/sec   Loss 6.9142   LearningRate 0.0233   Epoch: 10   Global Step: 128440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:52:34,399-Speed 3031.36 samples/sec   Loss 6.9053   LearningRate 0.0233   Epoch: 10   Global Step: 128450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:52:37,771-Speed 3037.60 samples/sec   Loss 6.8686   LearningRate 0.0233   Epoch: 10   Global Step: 128460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:52:41,125-Speed 3053.45 samples/sec   Loss 6.9450   LearningRate 0.0233   Epoch: 10   Global Step: 128470   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:52:44,505-Speed 3030.95 samples/sec   Loss 6.8497   LearningRate 0.0233   Epoch: 10   Global Step: 128480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:52:47,872-Speed 3041.75 samples/sec   Loss 6.8068   LearningRate 0.0233   Epoch: 10   Global Step: 128490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:52:51,301-Speed 2987.32 samples/sec   Loss 6.8913   LearningRate 0.0233   Epoch: 10   Global Step: 128500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:52:54,686-Speed 3025.70 samples/sec   Loss 6.7886   LearningRate 0.0233   Epoch: 10   Global Step: 128510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:52:58,105-Speed 2996.13 samples/sec   Loss 6.7965   LearningRate 0.0233   Epoch: 10   Global Step: 128520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:53:01,473-Speed 3041.49 samples/sec   Loss 6.8061   LearningRate 0.0233   Epoch: 10   Global Step: 128530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:53:04,899-Speed 2990.19 samples/sec   Loss 6.7809   LearningRate 0.0233   Epoch: 10   Global Step: 128540   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:53:08,259-Speed 3048.27 samples/sec   Loss 6.7852   LearningRate 0.0233   Epoch: 10   Global Step: 128550   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:53:11,611-Speed 3055.66 samples/sec   Loss 6.8041   LearningRate 0.0233   Epoch: 10   Global Step: 128560   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:53:14,963-Speed 3055.92 samples/sec   Loss 6.7177   LearningRate 0.0233   Epoch: 10   Global Step: 128570   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:53:18,400-Speed 2979.58 samples/sec   Loss 6.8679   LearningRate 0.0233   Epoch: 10   Global Step: 128580   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:53:21,900-Speed 2927.26 samples/sec   Loss 6.8977   LearningRate 0.0233   Epoch: 10   Global Step: 128590   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:53:25,384-Speed 2939.26 samples/sec   Loss 6.8560   LearningRate 0.0233   Epoch: 10   Global Step: 128600   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:53:28,784-Speed 3012.98 samples/sec   Loss 6.8544   LearningRate 0.0233   Epoch: 10   Global Step: 128610   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:53:32,237-Speed 2966.20 samples/sec   Loss 6.8048   LearningRate 0.0233   Epoch: 10   Global Step: 128620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:53:35,642-Speed 3007.93 samples/sec   Loss 6.8293   LearningRate 0.0233   Epoch: 10   Global Step: 128630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:53:39,031-Speed 3022.98 samples/sec   Loss 6.8338   LearningRate 0.0232   Epoch: 10   Global Step: 128640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:53:42,370-Speed 3067.36 samples/sec   Loss 6.8560   LearningRate 0.0232   Epoch: 10   Global Step: 128650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:53:45,737-Speed 3042.46 samples/sec   Loss 6.8878   LearningRate 0.0232   Epoch: 10   Global Step: 128660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:53:49,131-Speed 3017.56 samples/sec   Loss 6.7456   LearningRate 0.0232   Epoch: 10   Global Step: 128670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:53:52,623-Speed 2933.39 samples/sec   Loss 6.8645   LearningRate 0.0232   Epoch: 10   Global Step: 128680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:53:56,018-Speed 3016.55 samples/sec   Loss 6.7455   LearningRate 0.0232   Epoch: 10   Global Step: 128690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:53:59,393-Speed 3035.15 samples/sec   Loss 6.8661   LearningRate 0.0232   Epoch: 10   Global Step: 128700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:54:02,764-Speed 3038.41 samples/sec   Loss 6.8158   LearningRate 0.0232   Epoch: 10   Global Step: 128710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:54:06,126-Speed 3046.62 samples/sec   Loss 6.7839   LearningRate 0.0232   Epoch: 10   Global Step: 128720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:54:09,527-Speed 3012.10 samples/sec   Loss 6.8543   LearningRate 0.0232   Epoch: 10   Global Step: 128730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:54:12,973-Speed 2972.35 samples/sec   Loss 6.7645   LearningRate 0.0232   Epoch: 10   Global Step: 128740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:54:16,468-Speed 2930.13 samples/sec   Loss 6.8080   LearningRate 0.0232   Epoch: 10   Global Step: 128750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:54:19,871-Speed 3010.48 samples/sec   Loss 6.9215   LearningRate 0.0232   Epoch: 10   Global Step: 128760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:54:23,369-Speed 2927.64 samples/sec   Loss 6.9507   LearningRate 0.0232   Epoch: 10   Global Step: 128770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:54:26,758-Speed 3022.56 samples/sec   Loss 6.8028   LearningRate 0.0232   Epoch: 10   Global Step: 128780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:54:30,110-Speed 3056.96 samples/sec   Loss 6.8660   LearningRate 0.0232   Epoch: 10   Global Step: 128790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:54:33,496-Speed 3024.37 samples/sec   Loss 6.8027   LearningRate 0.0232   Epoch: 10   Global Step: 128800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:54:36,888-Speed 3019.65 samples/sec   Loss 6.8347   LearningRate 0.0232   Epoch: 10   Global Step: 128810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:54:40,291-Speed 3010.91 samples/sec   Loss 6.8674   LearningRate 0.0232   Epoch: 10   Global Step: 128820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:54:43,656-Speed 3043.62 samples/sec   Loss 6.7509   LearningRate 0.0232   Epoch: 10   Global Step: 128830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:54:47,138-Speed 2941.48 samples/sec   Loss 6.8291   LearningRate 0.0232   Epoch: 10   Global Step: 128840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:54:50,669-Speed 2901.26 samples/sec   Loss 6.7773   LearningRate 0.0232   Epoch: 10   Global Step: 128850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:54:54,109-Speed 2977.63 samples/sec   Loss 6.8717   LearningRate 0.0232   Epoch: 10   Global Step: 128860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:54:57,441-Speed 3074.57 samples/sec   Loss 6.8302   LearningRate 0.0232   Epoch: 10   Global Step: 128870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:55:00,819-Speed 3032.01 samples/sec   Loss 6.9008   LearningRate 0.0232   Epoch: 10   Global Step: 128880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:55:04,211-Speed 3019.48 samples/sec   Loss 6.8491   LearningRate 0.0232   Epoch: 10   Global Step: 128890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:55:07,589-Speed 3032.18 samples/sec   Loss 6.8847   LearningRate 0.0231   Epoch: 10   Global Step: 128900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:55:10,960-Speed 3038.67 samples/sec   Loss 6.7599   LearningRate 0.0231   Epoch: 10   Global Step: 128910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:55:14,328-Speed 3041.03 samples/sec   Loss 6.8096   LearningRate 0.0231   Epoch: 10   Global Step: 128920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:55:17,692-Speed 3044.71 samples/sec   Loss 6.9350   LearningRate 0.0231   Epoch: 10   Global Step: 128930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:55:21,067-Speed 3034.48 samples/sec   Loss 6.9237   LearningRate 0.0231   Epoch: 10   Global Step: 128940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:55:24,437-Speed 3039.90 samples/sec   Loss 6.9202   LearningRate 0.0231   Epoch: 10   Global Step: 128950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:55:27,759-Speed 3083.46 samples/sec   Loss 6.9118   LearningRate 0.0231   Epoch: 10   Global Step: 128960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:55:31,095-Speed 3069.90 samples/sec   Loss 6.7493   LearningRate 0.0231   Epoch: 10   Global Step: 128970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:55:34,612-Speed 2912.61 samples/sec   Loss 6.7719   LearningRate 0.0231   Epoch: 10   Global Step: 128980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:55:37,990-Speed 3032.83 samples/sec   Loss 6.7658   LearningRate 0.0231   Epoch: 10   Global Step: 128990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:55:41,351-Speed 3047.64 samples/sec   Loss 6.8482   LearningRate 0.0231   Epoch: 10   Global Step: 129000   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:55:44,724-Speed 3035.86 samples/sec   Loss 6.9224   LearningRate 0.0231   Epoch: 10   Global Step: 129010   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:55:48,072-Speed 3059.38 samples/sec   Loss 6.9187   LearningRate 0.0231   Epoch: 10   Global Step: 129020   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:55:51,445-Speed 3036.29 samples/sec   Loss 6.8325   LearningRate 0.0231   Epoch: 10   Global Step: 129030   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:55:54,919-Speed 2949.06 samples/sec   Loss 6.7452   LearningRate 0.0231   Epoch: 10   Global Step: 129040   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:55:58,415-Speed 2929.79 samples/sec   Loss 6.8604   LearningRate 0.0231   Epoch: 10   Global Step: 129050   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:56:01,796-Speed 3029.23 samples/sec   Loss 6.7620   LearningRate 0.0231   Epoch: 10   Global Step: 129060   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:56:05,206-Speed 3004.12 samples/sec   Loss 6.8082   LearningRate 0.0231   Epoch: 10   Global Step: 129070   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:56:08,639-Speed 2983.63 samples/sec   Loss 6.7740   LearningRate 0.0231   Epoch: 10   Global Step: 129080   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:56:11,954-Speed 3089.99 samples/sec   Loss 6.7972   LearningRate 0.0231   Epoch: 10   Global Step: 129090   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 13:56:15,324-Speed 3039.30 samples/sec   Loss 6.8141   LearningRate 0.0231   Epoch: 10   Global Step: 129100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:56:18,761-Speed 2980.41 samples/sec   Loss 6.8878   LearningRate 0.0231   Epoch: 10   Global Step: 129110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:56:22,148-Speed 3024.02 samples/sec   Loss 6.8199   LearningRate 0.0231   Epoch: 10   Global Step: 129120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:56:25,568-Speed 2995.04 samples/sec   Loss 6.9000   LearningRate 0.0231   Epoch: 10   Global Step: 129130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:56:28,971-Speed 3009.99 samples/sec   Loss 6.8500   LearningRate 0.0231   Epoch: 10   Global Step: 129140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:56:32,301-Speed 3075.85 samples/sec   Loss 6.8003   LearningRate 0.0231   Epoch: 10   Global Step: 129150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:56:35,670-Speed 3040.39 samples/sec   Loss 6.8174   LearningRate 0.0230   Epoch: 10   Global Step: 129160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:56:39,055-Speed 3025.60 samples/sec   Loss 6.7203   LearningRate 0.0230   Epoch: 10   Global Step: 129170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:56:42,506-Speed 2967.84 samples/sec   Loss 6.6564   LearningRate 0.0230   Epoch: 10   Global Step: 129180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:56:45,895-Speed 3022.85 samples/sec   Loss 6.9680   LearningRate 0.0230   Epoch: 10   Global Step: 129190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:56:49,245-Speed 3058.06 samples/sec   Loss 6.9370   LearningRate 0.0230   Epoch: 10   Global Step: 129200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:56:52,628-Speed 3028.07 samples/sec   Loss 7.0116   LearningRate 0.0230   Epoch: 10   Global Step: 129210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:56:56,007-Speed 3030.40 samples/sec   Loss 6.8315   LearningRate 0.0230   Epoch: 10   Global Step: 129220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:56:59,357-Speed 3057.73 samples/sec   Loss 6.7695   LearningRate 0.0230   Epoch: 10   Global Step: 129230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:57:02,693-Speed 3071.05 samples/sec   Loss 6.8628   LearningRate 0.0230   Epoch: 10   Global Step: 129240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:57:06,091-Speed 3013.51 samples/sec   Loss 6.9142   LearningRate 0.0230   Epoch: 10   Global Step: 129250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:57:09,473-Speed 3028.73 samples/sec   Loss 6.7615   LearningRate 0.0230   Epoch: 10   Global Step: 129260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:57:12,909-Speed 2981.22 samples/sec   Loss 6.8291   LearningRate 0.0230   Epoch: 10   Global Step: 129270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:57:16,260-Speed 3055.92 samples/sec   Loss 6.8825   LearningRate 0.0230   Epoch: 10   Global Step: 129280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:57:19,610-Speed 3058.12 samples/sec   Loss 6.9551   LearningRate 0.0230   Epoch: 10   Global Step: 129290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:57:22,980-Speed 3039.17 samples/sec   Loss 6.9610   LearningRate 0.0230   Epoch: 10   Global Step: 129300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:57:26,324-Speed 3063.07 samples/sec   Loss 6.9207   LearningRate 0.0230   Epoch: 10   Global Step: 129310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:57:29,725-Speed 3012.29 samples/sec   Loss 6.8758   LearningRate 0.0230   Epoch: 10   Global Step: 129320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:57:33,106-Speed 3029.91 samples/sec   Loss 6.8974   LearningRate 0.0230   Epoch: 10   Global Step: 129330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:57:36,438-Speed 3074.29 samples/sec   Loss 6.9068   LearningRate 0.0230   Epoch: 10   Global Step: 129340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:57:39,831-Speed 3019.07 samples/sec   Loss 6.9367   LearningRate 0.0230   Epoch: 10   Global Step: 129350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:57:43,207-Speed 3034.04 samples/sec   Loss 6.9989   LearningRate 0.0230   Epoch: 10   Global Step: 129360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:57:46,588-Speed 3028.80 samples/sec   Loss 6.7808   LearningRate 0.0230   Epoch: 10   Global Step: 129370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:57:50,005-Speed 2998.05 samples/sec   Loss 6.9134   LearningRate 0.0230   Epoch: 10   Global Step: 129380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:57:53,348-Speed 3064.36 samples/sec   Loss 6.8799   LearningRate 0.0230   Epoch: 10   Global Step: 129390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:57:56,736-Speed 3023.51 samples/sec   Loss 6.9610   LearningRate 0.0230   Epoch: 10   Global Step: 129400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:58:00,195-Speed 2961.78 samples/sec   Loss 6.8840   LearningRate 0.0230   Epoch: 10   Global Step: 129410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:58:03,595-Speed 3011.94 samples/sec   Loss 6.7980   LearningRate 0.0229   Epoch: 10   Global Step: 129420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:58:06,992-Speed 3015.34 samples/sec   Loss 6.9542   LearningRate 0.0229   Epoch: 10   Global Step: 129430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:58:10,462-Speed 2951.78 samples/sec   Loss 6.9315   LearningRate 0.0229   Epoch: 10   Global Step: 129440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:58:13,785-Speed 3082.29 samples/sec   Loss 6.9023   LearningRate 0.0229   Epoch: 10   Global Step: 129450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:58:17,159-Speed 3036.02 samples/sec   Loss 6.8373   LearningRate 0.0229   Epoch: 10   Global Step: 129460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:58:20,495-Speed 3070.83 samples/sec   Loss 6.7742   LearningRate 0.0229   Epoch: 10   Global Step: 129470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:58:23,831-Speed 3070.42 samples/sec   Loss 6.9322   LearningRate 0.0229   Epoch: 10   Global Step: 129480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:58:27,156-Speed 3080.74 samples/sec   Loss 6.8969   LearningRate 0.0229   Epoch: 10   Global Step: 129490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:58:30,508-Speed 3055.34 samples/sec   Loss 6.9609   LearningRate 0.0229   Epoch: 10   Global Step: 129500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:58:33,843-Speed 3071.67 samples/sec   Loss 6.9233   LearningRate 0.0229   Epoch: 10   Global Step: 129510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:58:37,214-Speed 3037.89 samples/sec   Loss 6.7821   LearningRate 0.0229   Epoch: 10   Global Step: 129520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:58:40,596-Speed 3029.67 samples/sec   Loss 6.7414   LearningRate 0.0229   Epoch: 10   Global Step: 129530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:58:43,916-Speed 3084.26 samples/sec   Loss 6.7137   LearningRate 0.0229   Epoch: 10   Global Step: 129540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:58:47,270-Speed 3054.07 samples/sec   Loss 6.8529   LearningRate 0.0229   Epoch: 10   Global Step: 129550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:58:50,689-Speed 2995.97 samples/sec   Loss 6.8772   LearningRate 0.0229   Epoch: 10   Global Step: 129560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:58:54,079-Speed 3021.58 samples/sec   Loss 6.8917   LearningRate 0.0229   Epoch: 10   Global Step: 129570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:58:57,542-Speed 2957.44 samples/sec   Loss 7.0150   LearningRate 0.0229   Epoch: 10   Global Step: 129580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:59:00,994-Speed 2967.25 samples/sec   Loss 6.9088   LearningRate 0.0229   Epoch: 10   Global Step: 129590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:59:04,394-Speed 3012.47 samples/sec   Loss 7.0159   LearningRate 0.0229   Epoch: 10   Global Step: 129600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:59:07,807-Speed 3001.21 samples/sec   Loss 6.8029   LearningRate 0.0229   Epoch: 10   Global Step: 129610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:59:11,194-Speed 3025.25 samples/sec   Loss 6.8119   LearningRate 0.0229   Epoch: 10   Global Step: 129620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:59:14,512-Speed 3087.19 samples/sec   Loss 6.8679   LearningRate 0.0229   Epoch: 10   Global Step: 129630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:59:17,849-Speed 3068.91 samples/sec   Loss 6.9938   LearningRate 0.0229   Epoch: 10   Global Step: 129640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:59:21,263-Speed 3001.23 samples/sec   Loss 6.8717   LearningRate 0.0229   Epoch: 10   Global Step: 129650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 13:59:24,622-Speed 3048.67 samples/sec   Loss 6.8798   LearningRate 0.0229   Epoch: 10   Global Step: 129660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:59:28,004-Speed 3028.68 samples/sec   Loss 6.8677   LearningRate 0.0229   Epoch: 10   Global Step: 129670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:59:31,390-Speed 3025.35 samples/sec   Loss 6.9216   LearningRate 0.0228   Epoch: 10   Global Step: 129680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:59:34,798-Speed 3005.67 samples/sec   Loss 6.8198   LearningRate 0.0228   Epoch: 10   Global Step: 129690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:59:38,227-Speed 2986.45 samples/sec   Loss 6.9358   LearningRate 0.0228   Epoch: 10   Global Step: 129700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:59:41,644-Speed 2997.86 samples/sec   Loss 6.8529   LearningRate 0.0228   Epoch: 10   Global Step: 129710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:59:45,115-Speed 2950.82 samples/sec   Loss 7.0032   LearningRate 0.0228   Epoch: 10   Global Step: 129720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:59:48,518-Speed 3009.93 samples/sec   Loss 6.8796   LearningRate 0.0228   Epoch: 10   Global Step: 129730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:59:51,877-Speed 3049.48 samples/sec   Loss 6.8979   LearningRate 0.0228   Epoch: 10   Global Step: 129740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:59:55,187-Speed 3094.49 samples/sec   Loss 6.8206   LearningRate 0.0228   Epoch: 10   Global Step: 129750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 13:59:58,599-Speed 3001.43 samples/sec   Loss 6.9439   LearningRate 0.0228   Epoch: 10   Global Step: 129760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:00:02,022-Speed 2992.58 samples/sec   Loss 6.9177   LearningRate 0.0228   Epoch: 10   Global Step: 129770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:00:05,400-Speed 3032.80 samples/sec   Loss 6.9222   LearningRate 0.0228   Epoch: 10   Global Step: 129780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:00:08,777-Speed 3032.53 samples/sec   Loss 6.7808   LearningRate 0.0228   Epoch: 10   Global Step: 129790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:00:12,143-Speed 3042.89 samples/sec   Loss 6.8755   LearningRate 0.0228   Epoch: 10   Global Step: 129800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:00:15,578-Speed 2982.19 samples/sec   Loss 6.9765   LearningRate 0.0228   Epoch: 10   Global Step: 129810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:00:18,969-Speed 3020.48 samples/sec   Loss 6.8494   LearningRate 0.0228   Epoch: 10   Global Step: 129820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:00:22,391-Speed 2993.34 samples/sec   Loss 6.9287   LearningRate 0.0228   Epoch: 10   Global Step: 129830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:00:25,825-Speed 2982.90 samples/sec   Loss 6.7738   LearningRate 0.0228   Epoch: 10   Global Step: 129840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:00:29,326-Speed 2925.99 samples/sec   Loss 6.8940   LearningRate 0.0228   Epoch: 10   Global Step: 129850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:00:32,790-Speed 2956.23 samples/sec   Loss 6.9420   LearningRate 0.0228   Epoch: 10   Global Step: 129860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:00:36,283-Speed 2932.39 samples/sec   Loss 6.8703   LearningRate 0.0228   Epoch: 10   Global Step: 129870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:00:39,676-Speed 3019.08 samples/sec   Loss 6.8921   LearningRate 0.0228   Epoch: 10   Global Step: 129880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:00:43,041-Speed 3044.01 samples/sec   Loss 7.0288   LearningRate 0.0228   Epoch: 10   Global Step: 129890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:00:46,452-Speed 3002.64 samples/sec   Loss 6.8266   LearningRate 0.0228   Epoch: 10   Global Step: 129900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:00:49,793-Speed 3066.09 samples/sec   Loss 6.9560   LearningRate 0.0228   Epoch: 10   Global Step: 129910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:00:53,214-Speed 2994.32 samples/sec   Loss 6.9081   LearningRate 0.0228   Epoch: 10   Global Step: 129920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:00:56,629-Speed 2999.80 samples/sec   Loss 6.9866   LearningRate 0.0228   Epoch: 10   Global Step: 129930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:01:00,039-Speed 3003.24 samples/sec   Loss 6.8780   LearningRate 0.0227   Epoch: 10   Global Step: 129940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:01:03,399-Speed 3048.51 samples/sec   Loss 6.8947   LearningRate 0.0227   Epoch: 10   Global Step: 129950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:01:06,773-Speed 3035.22 samples/sec   Loss 6.8261   LearningRate 0.0227   Epoch: 10   Global Step: 129960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:01:10,200-Speed 2989.44 samples/sec   Loss 6.9218   LearningRate 0.0227   Epoch: 10   Global Step: 129970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:01:13,598-Speed 3013.86 samples/sec   Loss 6.7885   LearningRate 0.0227   Epoch: 10   Global Step: 129980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:01:17,002-Speed 3009.31 samples/sec   Loss 6.8251   LearningRate 0.0227   Epoch: 10   Global Step: 129990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:01:20,477-Speed 2947.82 samples/sec   Loss 6.9875   LearningRate 0.0227   Epoch: 10   Global Step: 130000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:01:23,987-Speed 2917.69 samples/sec   Loss 6.7882   LearningRate 0.0227   Epoch: 10   Global Step: 130010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:01:27,391-Speed 3009.21 samples/sec   Loss 6.9032   LearningRate 0.0227   Epoch: 10   Global Step: 130020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 14:01:30,732-Speed 3066.45 samples/sec   Loss 6.8091   LearningRate 0.0227   Epoch: 10   Global Step: 130030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 14:01:34,108-Speed 3033.36 samples/sec   Loss 6.8576   LearningRate 0.0227   Epoch: 10   Global Step: 130040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:01:37,509-Speed 3011.78 samples/sec   Loss 6.7796   LearningRate 0.0227   Epoch: 10   Global Step: 130050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:01:40,879-Speed 3039.84 samples/sec   Loss 6.8459   LearningRate 0.0227   Epoch: 10   Global Step: 130060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:01:44,309-Speed 2985.88 samples/sec   Loss 6.7975   LearningRate 0.0227   Epoch: 10   Global Step: 130070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:01:47,662-Speed 3054.26 samples/sec   Loss 6.9739   LearningRate 0.0227   Epoch: 10   Global Step: 130080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:01:51,158-Speed 2930.80 samples/sec   Loss 6.8360   LearningRate 0.0227   Epoch: 10   Global Step: 130090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:01:54,545-Speed 3023.45 samples/sec   Loss 6.8727   LearningRate 0.0227   Epoch: 10   Global Step: 130100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:01:57,906-Speed 3047.74 samples/sec   Loss 6.9845   LearningRate 0.0227   Epoch: 10   Global Step: 130110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:02:01,300-Speed 3017.99 samples/sec   Loss 6.8126   LearningRate 0.0227   Epoch: 10   Global Step: 130120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:02:04,733-Speed 2983.79 samples/sec   Loss 6.8700   LearningRate 0.0227   Epoch: 10   Global Step: 130130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:02:08,079-Speed 3061.47 samples/sec   Loss 6.8780   LearningRate 0.0227   Epoch: 10   Global Step: 130140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:02:11,418-Speed 3067.63 samples/sec   Loss 6.8842   LearningRate 0.0227   Epoch: 10   Global Step: 130150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:02:14,826-Speed 3004.94 samples/sec   Loss 6.8603   LearningRate 0.0227   Epoch: 10   Global Step: 130160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:02:18,198-Speed 3038.50 samples/sec   Loss 6.9488   LearningRate 0.0227   Epoch: 10   Global Step: 130170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:02:21,661-Speed 2957.48 samples/sec   Loss 6.8179   LearningRate 0.0227   Epoch: 10   Global Step: 130180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:02:25,074-Speed 3001.37 samples/sec   Loss 6.6747   LearningRate 0.0227   Epoch: 10   Global Step: 130190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:02:28,480-Speed 3006.83 samples/sec   Loss 6.9741   LearningRate 0.0226   Epoch: 10   Global Step: 130200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:02:31,854-Speed 3036.05 samples/sec   Loss 6.8849   LearningRate 0.0226   Epoch: 10   Global Step: 130210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:02:35,273-Speed 2995.83 samples/sec   Loss 6.8250   LearningRate 0.0226   Epoch: 10   Global Step: 130220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:02:38,684-Speed 3002.60 samples/sec   Loss 6.9238   LearningRate 0.0226   Epoch: 10   Global Step: 130230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:02:42,027-Speed 3063.74 samples/sec   Loss 6.8504   LearningRate 0.0226   Epoch: 10   Global Step: 130240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:02:45,442-Speed 2999.50 samples/sec   Loss 7.0483   LearningRate 0.0226   Epoch: 10   Global Step: 130250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:02:48,855-Speed 3001.41 samples/sec   Loss 6.9445   LearningRate 0.0226   Epoch: 10   Global Step: 130260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:02:52,324-Speed 2952.77 samples/sec   Loss 6.9543   LearningRate 0.0226   Epoch: 10   Global Step: 130270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:02:55,734-Speed 3003.92 samples/sec   Loss 6.9181   LearningRate 0.0226   Epoch: 10   Global Step: 130280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:02:59,156-Speed 2992.74 samples/sec   Loss 6.8827   LearningRate 0.0226   Epoch: 10   Global Step: 130290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:03:02,523-Speed 3042.82 samples/sec   Loss 7.0048   LearningRate 0.0226   Epoch: 10   Global Step: 130300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:03:05,975-Speed 2966.75 samples/sec   Loss 6.8576   LearningRate 0.0226   Epoch: 10   Global Step: 130310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:03:09,343-Speed 3041.86 samples/sec   Loss 6.8109   LearningRate 0.0226   Epoch: 10   Global Step: 130320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:03:12,720-Speed 3033.33 samples/sec   Loss 6.8967   LearningRate 0.0226   Epoch: 10   Global Step: 130330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:03:16,116-Speed 3016.29 samples/sec   Loss 6.8952   LearningRate 0.0226   Epoch: 10   Global Step: 130340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:03:20,188-Speed 2515.03 samples/sec   Loss 6.8729   LearningRate 0.0226   Epoch: 10   Global Step: 130350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:03:23,548-Speed 3048.18 samples/sec   Loss 6.9257   LearningRate 0.0226   Epoch: 10   Global Step: 130360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:03:26,876-Speed 3077.56 samples/sec   Loss 6.9033   LearningRate 0.0226   Epoch: 10   Global Step: 130370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:03:30,262-Speed 3026.12 samples/sec   Loss 6.9254   LearningRate 0.0226   Epoch: 10   Global Step: 130380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:03:33,614-Speed 3055.41 samples/sec   Loss 6.7553   LearningRate 0.0226   Epoch: 10   Global Step: 130390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:03:37,028-Speed 3000.85 samples/sec   Loss 6.7680   LearningRate 0.0226   Epoch: 10   Global Step: 130400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:03:40,418-Speed 3021.77 samples/sec   Loss 7.0066   LearningRate 0.0226   Epoch: 10   Global Step: 130410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 14:03:43,721-Speed 3100.61 samples/sec   Loss 6.7791   LearningRate 0.0226   Epoch: 10   Global Step: 130420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:03:47,154-Speed 2983.94 samples/sec   Loss 6.8295   LearningRate 0.0226   Epoch: 10   Global Step: 130430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:03:50,520-Speed 3043.51 samples/sec   Loss 6.9588   LearningRate 0.0226   Epoch: 10   Global Step: 130440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:03:53,901-Speed 3028.94 samples/sec   Loss 6.8625   LearningRate 0.0226   Epoch: 10   Global Step: 130450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:03:57,267-Speed 3043.09 samples/sec   Loss 6.9256   LearningRate 0.0225   Epoch: 10   Global Step: 130460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:00,701-Speed 2983.03 samples/sec   Loss 6.8585   LearningRate 0.0225   Epoch: 10   Global Step: 130470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:04,067-Speed 3043.14 samples/sec   Loss 6.7715   LearningRate 0.0225   Epoch: 10   Global Step: 130480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:07,463-Speed 3015.83 samples/sec   Loss 6.9879   LearningRate 0.0225   Epoch: 10   Global Step: 130490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:10,872-Speed 3004.64 samples/sec   Loss 6.9098   LearningRate 0.0225   Epoch: 10   Global Step: 130500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:14,300-Speed 2988.77 samples/sec   Loss 6.7659   LearningRate 0.0225   Epoch: 10   Global Step: 130510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:17,694-Speed 3017.18 samples/sec   Loss 6.7847   LearningRate 0.0225   Epoch: 10   Global Step: 130520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 14:04:21,053-Speed 3049.94 samples/sec   Loss 6.8471   LearningRate 0.0225   Epoch: 10   Global Step: 130530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:24,408-Speed 3052.76 samples/sec   Loss 6.9513   LearningRate 0.0225   Epoch: 10   Global Step: 130540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:27,784-Speed 3033.84 samples/sec   Loss 6.7617   LearningRate 0.0225   Epoch: 10   Global Step: 130550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:31,226-Speed 2976.57 samples/sec   Loss 6.8879   LearningRate 0.0225   Epoch: 10   Global Step: 130560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:34,585-Speed 3049.33 samples/sec   Loss 6.8876   LearningRate 0.0225   Epoch: 10   Global Step: 130570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:37,954-Speed 3040.49 samples/sec   Loss 6.7683   LearningRate 0.0225   Epoch: 10   Global Step: 130580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:41,438-Speed 2939.79 samples/sec   Loss 6.9117   LearningRate 0.0225   Epoch: 10   Global Step: 130590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:44,924-Speed 2938.75 samples/sec   Loss 6.8997   LearningRate 0.0225   Epoch: 10   Global Step: 130600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:48,336-Speed 3001.62 samples/sec   Loss 6.9291   LearningRate 0.0225   Epoch: 10   Global Step: 130610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:51,731-Speed 3017.07 samples/sec   Loss 6.9665   LearningRate 0.0225   Epoch: 10   Global Step: 130620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:04:55,137-Speed 3007.67 samples/sec   Loss 6.8215   LearningRate 0.0225   Epoch: 10   Global Step: 130630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:04:58,555-Speed 2996.62 samples/sec   Loss 6.8328   LearningRate 0.0225   Epoch: 10   Global Step: 130640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:01,914-Speed 3050.10 samples/sec   Loss 6.8802   LearningRate 0.0225   Epoch: 10   Global Step: 130650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:05,340-Speed 2988.72 samples/sec   Loss 6.8073   LearningRate 0.0225   Epoch: 10   Global Step: 130660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:08,766-Speed 2990.40 samples/sec   Loss 6.8216   LearningRate 0.0225   Epoch: 10   Global Step: 130670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:12,248-Speed 2941.82 samples/sec   Loss 6.8056   LearningRate 0.0225   Epoch: 10   Global Step: 130680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:15,596-Speed 3059.57 samples/sec   Loss 6.7040   LearningRate 0.0225   Epoch: 10   Global Step: 130690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:19,008-Speed 3002.23 samples/sec   Loss 6.7683   LearningRate 0.0225   Epoch: 10   Global Step: 130700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:22,465-Speed 2962.44 samples/sec   Loss 6.8949   LearningRate 0.0225   Epoch: 10   Global Step: 130710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:25,821-Speed 3052.03 samples/sec   Loss 6.8141   LearningRate 0.0224   Epoch: 10   Global Step: 130720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:29,311-Speed 2934.70 samples/sec   Loss 6.9719   LearningRate 0.0224   Epoch: 10   Global Step: 130730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:32,725-Speed 3000.95 samples/sec   Loss 6.9586   LearningRate 0.0224   Epoch: 10   Global Step: 130740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:36,064-Speed 3067.51 samples/sec   Loss 6.8089   LearningRate 0.0224   Epoch: 10   Global Step: 130750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:39,440-Speed 3034.55 samples/sec   Loss 6.8535   LearningRate 0.0224   Epoch: 10   Global Step: 130760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:42,878-Speed 2979.17 samples/sec   Loss 6.7857   LearningRate 0.0224   Epoch: 10   Global Step: 130770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:46,280-Speed 3010.79 samples/sec   Loss 6.9298   LearningRate 0.0224   Epoch: 10   Global Step: 130780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:49,721-Speed 2976.18 samples/sec   Loss 6.9237   LearningRate 0.0224   Epoch: 10   Global Step: 130790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:53,094-Speed 3037.25 samples/sec   Loss 6.7774   LearningRate 0.0224   Epoch: 10   Global Step: 130800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:56,498-Speed 3008.57 samples/sec   Loss 6.7501   LearningRate 0.0224   Epoch: 10   Global Step: 130810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:05:59,928-Speed 2986.33 samples/sec   Loss 6.9391   LearningRate 0.0224   Epoch: 10   Global Step: 130820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:06:03,359-Speed 2986.80 samples/sec   Loss 6.8448   LearningRate 0.0224   Epoch: 10   Global Step: 130830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:06,687-Speed 3077.16 samples/sec   Loss 6.8919   LearningRate 0.0224   Epoch: 10   Global Step: 130840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:10,118-Speed 2985.47 samples/sec   Loss 6.8162   LearningRate 0.0224   Epoch: 10   Global Step: 130850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:13,550-Speed 2985.15 samples/sec   Loss 6.9105   LearningRate 0.0224   Epoch: 10   Global Step: 130860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:16,925-Speed 3034.56 samples/sec   Loss 6.9192   LearningRate 0.0224   Epoch: 10   Global Step: 130870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:20,357-Speed 2984.65 samples/sec   Loss 6.7839   LearningRate 0.0224   Epoch: 10   Global Step: 130880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:23,834-Speed 2945.58 samples/sec   Loss 6.9118   LearningRate 0.0224   Epoch: 10   Global Step: 130890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:27,208-Speed 3035.70 samples/sec   Loss 7.0228   LearningRate 0.0224   Epoch: 10   Global Step: 130900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:30,572-Speed 3045.23 samples/sec   Loss 6.9226   LearningRate 0.0224   Epoch: 10   Global Step: 130910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:33,960-Speed 3023.76 samples/sec   Loss 6.9295   LearningRate 0.0224   Epoch: 10   Global Step: 130920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:37,361-Speed 3012.16 samples/sec   Loss 6.8987   LearningRate 0.0224   Epoch: 10   Global Step: 130930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:40,798-Speed 2979.84 samples/sec   Loss 6.8970   LearningRate 0.0224   Epoch: 10   Global Step: 130940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:44,127-Speed 3076.62 samples/sec   Loss 6.8499   LearningRate 0.0224   Epoch: 10   Global Step: 130950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:47,535-Speed 3005.63 samples/sec   Loss 6.8687   LearningRate 0.0224   Epoch: 10   Global Step: 130960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:50,920-Speed 3026.70 samples/sec   Loss 6.9121   LearningRate 0.0224   Epoch: 10   Global Step: 130970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:54,359-Speed 2978.20 samples/sec   Loss 6.8622   LearningRate 0.0223   Epoch: 10   Global Step: 130980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:06:57,749-Speed 3021.47 samples/sec   Loss 6.8345   LearningRate 0.0223   Epoch: 10   Global Step: 130990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:07:01,185-Speed 2981.32 samples/sec   Loss 6.9291   LearningRate 0.0223   Epoch: 10   Global Step: 131000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:07:04,513-Speed 3077.95 samples/sec   Loss 6.8845   LearningRate 0.0223   Epoch: 10   Global Step: 131010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:07:07,905-Speed 3019.96 samples/sec   Loss 6.7819   LearningRate 0.0223   Epoch: 10   Global Step: 131020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:07:11,246-Speed 3065.98 samples/sec   Loss 7.0017   LearningRate 0.0223   Epoch: 10   Global Step: 131030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 14:07:14,623-Speed 3033.07 samples/sec   Loss 6.9112   LearningRate 0.0223   Epoch: 10   Global Step: 131040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:07:18,094-Speed 2950.67 samples/sec   Loss 6.8023   LearningRate 0.0223   Epoch: 10   Global Step: 131050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:07:21,566-Speed 2950.08 samples/sec   Loss 6.8574   LearningRate 0.0223   Epoch: 10   Global Step: 131060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:07:24,970-Speed 3008.65 samples/sec   Loss 6.8634   LearningRate 0.0223   Epoch: 10   Global Step: 131070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:07:28,335-Speed 3044.51 samples/sec   Loss 6.8353   LearningRate 0.0223   Epoch: 10   Global Step: 131080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:07:31,767-Speed 2984.69 samples/sec   Loss 6.9560   LearningRate 0.0223   Epoch: 10   Global Step: 131090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:07:35,172-Speed 3008.58 samples/sec   Loss 6.7534   LearningRate 0.0223   Epoch: 10   Global Step: 131100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:07:38,527-Speed 3053.22 samples/sec   Loss 6.8752   LearningRate 0.0223   Epoch: 10   Global Step: 131110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:07:41,964-Speed 2979.60 samples/sec   Loss 6.7722   LearningRate 0.0223   Epoch: 10   Global Step: 131120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:07:45,312-Speed 3059.16 samples/sec   Loss 6.7481   LearningRate 0.0223   Epoch: 10   Global Step: 131130   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:07:48,665-Speed 3055.71 samples/sec   Loss 6.9792   LearningRate 0.0223   Epoch: 10   Global Step: 131140   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:07:52,014-Speed 3058.35 samples/sec   Loss 6.8530   LearningRate 0.0223   Epoch: 10   Global Step: 131150   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:07:55,439-Speed 2990.15 samples/sec   Loss 6.8731   LearningRate 0.0223   Epoch: 10   Global Step: 131160   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:07:58,796-Speed 3051.99 samples/sec   Loss 6.8346   LearningRate 0.0223   Epoch: 10   Global Step: 131170   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:08:02,108-Speed 3092.31 samples/sec   Loss 6.9015   LearningRate 0.0223   Epoch: 10   Global Step: 131180   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:08:05,495-Speed 3023.87 samples/sec   Loss 6.8558   LearningRate 0.0223   Epoch: 10   Global Step: 131190   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:08:08,900-Speed 3008.45 samples/sec   Loss 6.8759   LearningRate 0.0223   Epoch: 10   Global Step: 131200   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:08:12,311-Speed 3002.87 samples/sec   Loss 6.9091   LearningRate 0.0223   Epoch: 10   Global Step: 131210   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:08:15,738-Speed 2988.97 samples/sec   Loss 6.9026   LearningRate 0.0223   Epoch: 10   Global Step: 131220   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:08:19,101-Speed 3045.43 samples/sec   Loss 6.8226   LearningRate 0.0223   Epoch: 10   Global Step: 131230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:08:22,520-Speed 2996.06 samples/sec   Loss 6.9322   LearningRate 0.0223   Epoch: 10   Global Step: 131240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:08:25,921-Speed 3011.40 samples/sec   Loss 6.8383   LearningRate 0.0222   Epoch: 10   Global Step: 131250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:08:29,342-Speed 2994.88 samples/sec   Loss 6.8218   LearningRate 0.0222   Epoch: 10   Global Step: 131260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:08:32,807-Speed 2956.07 samples/sec   Loss 6.8107   LearningRate 0.0222   Epoch: 10   Global Step: 131270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:08:36,120-Speed 3091.32 samples/sec   Loss 6.8079   LearningRate 0.0222   Epoch: 10   Global Step: 131280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:08:39,533-Speed 3000.81 samples/sec   Loss 6.8882   LearningRate 0.0222   Epoch: 10   Global Step: 131290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:08:42,960-Speed 2989.34 samples/sec   Loss 6.8428   LearningRate 0.0222   Epoch: 10   Global Step: 131300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:08:46,348-Speed 3023.27 samples/sec   Loss 6.8929   LearningRate 0.0222   Epoch: 10   Global Step: 131310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:08:49,729-Speed 3029.12 samples/sec   Loss 6.8684   LearningRate 0.0222   Epoch: 10   Global Step: 131320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:08:53,082-Speed 3055.41 samples/sec   Loss 6.7379   LearningRate 0.0222   Epoch: 10   Global Step: 131330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:08:56,419-Speed 3069.17 samples/sec   Loss 6.8565   LearningRate 0.0222   Epoch: 10   Global Step: 131340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:08:59,840-Speed 2994.24 samples/sec   Loss 6.8331   LearningRate 0.0222   Epoch: 10   Global Step: 131350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:03,175-Speed 3071.21 samples/sec   Loss 6.8282   LearningRate 0.0222   Epoch: 10   Global Step: 131360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:06,580-Speed 3008.05 samples/sec   Loss 6.7656   LearningRate 0.0222   Epoch: 10   Global Step: 131370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:10,004-Speed 2991.92 samples/sec   Loss 6.7849   LearningRate 0.0222   Epoch: 10   Global Step: 131380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:13,401-Speed 3014.87 samples/sec   Loss 6.9297   LearningRate 0.0222   Epoch: 10   Global Step: 131390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:16,819-Speed 2997.07 samples/sec   Loss 6.8753   LearningRate 0.0222   Epoch: 10   Global Step: 131400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:20,168-Speed 3058.22 samples/sec   Loss 6.8885   LearningRate 0.0222   Epoch: 10   Global Step: 131410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:23,583-Speed 2999.13 samples/sec   Loss 6.8584   LearningRate 0.0222   Epoch: 10   Global Step: 131420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:26,958-Speed 3036.16 samples/sec   Loss 6.8351   LearningRate 0.0222   Epoch: 10   Global Step: 131430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 14:09:30,383-Speed 2991.46 samples/sec   Loss 6.9201   LearningRate 0.0222   Epoch: 10   Global Step: 131440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:33,828-Speed 2973.32 samples/sec   Loss 6.8819   LearningRate 0.0222   Epoch: 10   Global Step: 131450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:37,170-Speed 3064.58 samples/sec   Loss 6.8384   LearningRate 0.0222   Epoch: 10   Global Step: 131460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:40,654-Speed 2939.84 samples/sec   Loss 6.6811   LearningRate 0.0222   Epoch: 10   Global Step: 131470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:44,055-Speed 3011.83 samples/sec   Loss 6.9157   LearningRate 0.0222   Epoch: 10   Global Step: 131480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:47,430-Speed 3034.84 samples/sec   Loss 6.7902   LearningRate 0.0222   Epoch: 10   Global Step: 131490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:50,887-Speed 2962.92 samples/sec   Loss 6.9133   LearningRate 0.0222   Epoch: 10   Global Step: 131500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:54,265-Speed 3032.75 samples/sec   Loss 6.7060   LearningRate 0.0221   Epoch: 10   Global Step: 131510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:09:57,716-Speed 2967.40 samples/sec   Loss 6.8114   LearningRate 0.0221   Epoch: 10   Global Step: 131520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:10:01,179-Speed 2958.52 samples/sec   Loss 6.8004   LearningRate 0.0221   Epoch: 10   Global Step: 131530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:10:04,584-Speed 3008.03 samples/sec   Loss 6.8132   LearningRate 0.0221   Epoch: 10   Global Step: 131540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:10:07,947-Speed 3046.13 samples/sec   Loss 6.8649   LearningRate 0.0221   Epoch: 10   Global Step: 131550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:10:11,360-Speed 3001.30 samples/sec   Loss 6.9309   LearningRate 0.0221   Epoch: 10   Global Step: 131560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:10:14,752-Speed 3019.57 samples/sec   Loss 6.8026   LearningRate 0.0221   Epoch: 10   Global Step: 131570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:10:18,104-Speed 3055.65 samples/sec   Loss 6.7002   LearningRate 0.0221   Epoch: 10   Global Step: 131580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:10:21,481-Speed 3033.59 samples/sec   Loss 6.8122   LearningRate 0.0221   Epoch: 10   Global Step: 131590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:10:24,851-Speed 3039.36 samples/sec   Loss 6.9294   LearningRate 0.0221   Epoch: 10   Global Step: 131600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:10:28,273-Speed 2993.14 samples/sec   Loss 6.7802   LearningRate 0.0221   Epoch: 10   Global Step: 131610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:10:31,614-Speed 3065.62 samples/sec   Loss 6.8427   LearningRate 0.0221   Epoch: 10   Global Step: 131620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:10:35,100-Speed 2938.26 samples/sec   Loss 6.8235   LearningRate 0.0221   Epoch: 10   Global Step: 131630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:10:38,562-Speed 2959.07 samples/sec   Loss 6.8087   LearningRate 0.0221   Epoch: 10   Global Step: 131640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:10:42,101-Speed 2894.02 samples/sec   Loss 6.9583   LearningRate 0.0221   Epoch: 10   Global Step: 131650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:10:45,470-Speed 3040.65 samples/sec   Loss 6.9293   LearningRate 0.0221   Epoch: 10   Global Step: 131660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:10:48,910-Speed 2977.54 samples/sec   Loss 6.8874   LearningRate 0.0221   Epoch: 10   Global Step: 131670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:10:52,250-Speed 3066.82 samples/sec   Loss 6.8373   LearningRate 0.0221   Epoch: 10   Global Step: 131680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:10:55,678-Speed 2987.69 samples/sec   Loss 6.8451   LearningRate 0.0221   Epoch: 10   Global Step: 131690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:10:59,110-Speed 2984.63 samples/sec   Loss 6.7951   LearningRate 0.0221   Epoch: 10   Global Step: 131700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:11:02,501-Speed 3020.66 samples/sec   Loss 6.8493   LearningRate 0.0221   Epoch: 10   Global Step: 131710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:11:05,986-Speed 2938.69 samples/sec   Loss 6.7566   LearningRate 0.0221   Epoch: 10   Global Step: 131720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:11:09,400-Speed 3000.58 samples/sec   Loss 6.7851   LearningRate 0.0221   Epoch: 10   Global Step: 131730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:11:12,770-Speed 3039.75 samples/sec   Loss 6.8628   LearningRate 0.0221   Epoch: 10   Global Step: 131740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:11:16,121-Speed 3056.57 samples/sec   Loss 6.9625   LearningRate 0.0221   Epoch: 10   Global Step: 131750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:11:19,479-Speed 3050.23 samples/sec   Loss 6.9079   LearningRate 0.0221   Epoch: 10   Global Step: 131760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:11:22,861-Speed 3028.81 samples/sec   Loss 6.7048   LearningRate 0.0220   Epoch: 10   Global Step: 131770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:11:26,227-Speed 3043.22 samples/sec   Loss 6.6788   LearningRate 0.0220   Epoch: 10   Global Step: 131780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:11:29,562-Speed 3070.53 samples/sec   Loss 6.8213   LearningRate 0.0220   Epoch: 10   Global Step: 131790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:11:32,923-Speed 3047.89 samples/sec   Loss 6.8004   LearningRate 0.0220   Epoch: 10   Global Step: 131800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:11:36,363-Speed 2977.65 samples/sec   Loss 6.7615   LearningRate 0.0220   Epoch: 10   Global Step: 131810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:11:39,810-Speed 2971.91 samples/sec   Loss 6.8130   LearningRate 0.0220   Epoch: 10   Global Step: 131820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:11:43,285-Speed 2948.00 samples/sec   Loss 6.8460   LearningRate 0.0220   Epoch: 10   Global Step: 131830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:11:46,664-Speed 3031.02 samples/sec   Loss 6.8960   LearningRate 0.0220   Epoch: 10   Global Step: 131840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:11:50,111-Speed 2971.09 samples/sec   Loss 6.8434   LearningRate 0.0220   Epoch: 10   Global Step: 131850   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:11:53,466-Speed 3052.95 samples/sec   Loss 6.9250   LearningRate 0.0220   Epoch: 10   Global Step: 131860   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:11:56,823-Speed 3051.31 samples/sec   Loss 6.8170   LearningRate 0.0220   Epoch: 10   Global Step: 131870   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:12:00,162-Speed 3067.59 samples/sec   Loss 6.9806   LearningRate 0.0220   Epoch: 10   Global Step: 131880   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:12:03,594-Speed 2984.52 samples/sec   Loss 6.8387   LearningRate 0.0220   Epoch: 10   Global Step: 131890   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:12:07,011-Speed 2997.55 samples/sec   Loss 6.9450   LearningRate 0.0220   Epoch: 10   Global Step: 131900   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:12:10,405-Speed 3017.68 samples/sec   Loss 6.8887   LearningRate 0.0220   Epoch: 10   Global Step: 131910   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:12:13,819-Speed 3001.08 samples/sec   Loss 6.8442   LearningRate 0.0220   Epoch: 10   Global Step: 131920   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:12:17,255-Speed 2980.93 samples/sec   Loss 6.9988   LearningRate 0.0220   Epoch: 10   Global Step: 131930   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:12:20,595-Speed 3066.50 samples/sec   Loss 6.7338   LearningRate 0.0220   Epoch: 10   Global Step: 131940   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:12:24,038-Speed 2974.70 samples/sec   Loss 6.8050   LearningRate 0.0220   Epoch: 10   Global Step: 131950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:12:27,408-Speed 3039.43 samples/sec   Loss 6.7854   LearningRate 0.0220   Epoch: 10   Global Step: 131960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:12:30,810-Speed 3010.68 samples/sec   Loss 6.7569   LearningRate 0.0220   Epoch: 10   Global Step: 131970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:12:34,157-Speed 3060.84 samples/sec   Loss 6.7900   LearningRate 0.0220   Epoch: 10   Global Step: 131980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:12:37,463-Speed 3098.54 samples/sec   Loss 6.8283   LearningRate 0.0220   Epoch: 10   Global Step: 131990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:12:40,845-Speed 3028.66 samples/sec   Loss 6.8878   LearningRate 0.0220   Epoch: 10   Global Step: 132000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:12:44,232-Speed 3024.14 samples/sec   Loss 6.9255   LearningRate 0.0220   Epoch: 10   Global Step: 132010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:12:47,592-Speed 3047.78 samples/sec   Loss 7.0177   LearningRate 0.0220   Epoch: 10   Global Step: 132020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:12:50,942-Speed 3057.90 samples/sec   Loss 6.8394   LearningRate 0.0220   Epoch: 10   Global Step: 132030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:12:54,284-Speed 3065.16 samples/sec   Loss 6.7688   LearningRate 0.0219   Epoch: 10   Global Step: 132040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:12:57,649-Speed 3044.56 samples/sec   Loss 6.8090   LearningRate 0.0219   Epoch: 10   Global Step: 132050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:13:01,031-Speed 3028.55 samples/sec   Loss 6.8480   LearningRate 0.0219   Epoch: 10   Global Step: 132060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:13:04,456-Speed 2990.29 samples/sec   Loss 7.0112   LearningRate 0.0219   Epoch: 10   Global Step: 132070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:13:07,778-Speed 3083.28 samples/sec   Loss 6.7838   LearningRate 0.0219   Epoch: 10   Global Step: 132080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:13:11,198-Speed 2995.03 samples/sec   Loss 6.7116   LearningRate 0.0219   Epoch: 10   Global Step: 132090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:13:14,628-Speed 2986.53 samples/sec   Loss 6.8301   LearningRate 0.0219   Epoch: 10   Global Step: 132100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:13:18,013-Speed 3025.42 samples/sec   Loss 6.9978   LearningRate 0.0219   Epoch: 10   Global Step: 132110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:13:21,383-Speed 3039.81 samples/sec   Loss 6.8223   LearningRate 0.0219   Epoch: 10   Global Step: 132120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:13:24,740-Speed 3051.53 samples/sec   Loss 6.9158   LearningRate 0.0219   Epoch: 10   Global Step: 132130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:13:28,107-Speed 3042.32 samples/sec   Loss 6.9006   LearningRate 0.0219   Epoch: 10   Global Step: 132140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:13:31,431-Speed 3081.29 samples/sec   Loss 6.8815   LearningRate 0.0219   Epoch: 10   Global Step: 132150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:13:34,742-Speed 3093.59 samples/sec   Loss 6.8281   LearningRate 0.0219   Epoch: 10   Global Step: 132160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:13:38,112-Speed 3040.22 samples/sec   Loss 6.8908   LearningRate 0.0219   Epoch: 10   Global Step: 132170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:13:41,476-Speed 3044.88 samples/sec   Loss 6.8044   LearningRate 0.0219   Epoch: 10   Global Step: 132180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:13:44,806-Speed 3075.76 samples/sec   Loss 6.9394   LearningRate 0.0219   Epoch: 10   Global Step: 132190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:13:48,150-Speed 3062.88 samples/sec   Loss 6.8868   LearningRate 0.0219   Epoch: 10   Global Step: 132200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:13:51,542-Speed 3019.73 samples/sec   Loss 6.8857   LearningRate 0.0219   Epoch: 10   Global Step: 132210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:13:54,900-Speed 3050.40 samples/sec   Loss 6.9623   LearningRate 0.0219   Epoch: 10   Global Step: 132220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:13:58,267-Speed 3042.68 samples/sec   Loss 6.8581   LearningRate 0.0219   Epoch: 10   Global Step: 132230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:14:01,579-Speed 3092.04 samples/sec   Loss 6.7752   LearningRate 0.0219   Epoch: 10   Global Step: 132240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:14:04,932-Speed 3055.40 samples/sec   Loss 6.9038   LearningRate 0.0219   Epoch: 10   Global Step: 132250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:14:08,268-Speed 3074.25 samples/sec   Loss 6.7468   LearningRate 0.0219   Epoch: 10   Global Step: 132260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:14:11,577-Speed 3095.80 samples/sec   Loss 6.7122   LearningRate 0.0219   Epoch: 10   Global Step: 132270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:14:14,982-Speed 3008.37 samples/sec   Loss 6.7064   LearningRate 0.0219   Epoch: 10   Global Step: 132280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:14:18,407-Speed 2990.38 samples/sec   Loss 6.8850   LearningRate 0.0219   Epoch: 10   Global Step: 132290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:14:21,819-Speed 3002.23 samples/sec   Loss 6.9194   LearningRate 0.0218   Epoch: 10   Global Step: 132300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:14:25,150-Speed 3074.92 samples/sec   Loss 6.7713   LearningRate 0.0218   Epoch: 10   Global Step: 132310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:14:28,580-Speed 2987.15 samples/sec   Loss 6.7688   LearningRate 0.0218   Epoch: 10   Global Step: 132320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:14:31,948-Speed 3040.44 samples/sec   Loss 6.8214   LearningRate 0.0218   Epoch: 10   Global Step: 132330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:14:35,404-Speed 2964.04 samples/sec   Loss 6.8761   LearningRate 0.0218   Epoch: 10   Global Step: 132340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:14:38,790-Speed 3025.02 samples/sec   Loss 6.7020   LearningRate 0.0218   Epoch: 10   Global Step: 132350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:14:42,143-Speed 3054.99 samples/sec   Loss 6.9856   LearningRate 0.0218   Epoch: 10   Global Step: 132360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:14:45,625-Speed 2941.43 samples/sec   Loss 6.7723   LearningRate 0.0218   Epoch: 10   Global Step: 132370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:14:49,107-Speed 2942.26 samples/sec   Loss 6.8242   LearningRate 0.0218   Epoch: 10   Global Step: 132380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:14:52,451-Speed 3062.62 samples/sec   Loss 6.8455   LearningRate 0.0218   Epoch: 10   Global Step: 132390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:14:55,912-Speed 2959.53 samples/sec   Loss 6.7429   LearningRate 0.0218   Epoch: 10   Global Step: 132400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:14:59,311-Speed 3013.56 samples/sec   Loss 6.7222   LearningRate 0.0218   Epoch: 10   Global Step: 132410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:15:02,739-Speed 2988.39 samples/sec   Loss 6.8337   LearningRate 0.0218   Epoch: 10   Global Step: 132420   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:15:06,099-Speed 3048.29 samples/sec   Loss 6.9687   LearningRate 0.0218   Epoch: 10   Global Step: 132430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:15:09,530-Speed 2985.44 samples/sec   Loss 6.8136   LearningRate 0.0218   Epoch: 10   Global Step: 132440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:15:12,902-Speed 3037.47 samples/sec   Loss 6.8822   LearningRate 0.0218   Epoch: 10   Global Step: 132450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:15:16,330-Speed 2988.80 samples/sec   Loss 6.7557   LearningRate 0.0218   Epoch: 10   Global Step: 132460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:15:19,700-Speed 3039.31 samples/sec   Loss 6.8619   LearningRate 0.0218   Epoch: 10   Global Step: 132470   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:15:23,067-Speed 3042.31 samples/sec   Loss 6.8470   LearningRate 0.0218   Epoch: 10   Global Step: 132480   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:15:26,442-Speed 3033.98 samples/sec   Loss 6.8058   LearningRate 0.0218   Epoch: 10   Global Step: 132490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:15:29,854-Speed 3002.80 samples/sec   Loss 6.8234   LearningRate 0.0218   Epoch: 10   Global Step: 132500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:15:33,206-Speed 3054.98 samples/sec   Loss 6.7498   LearningRate 0.0218   Epoch: 10   Global Step: 132510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:15:36,679-Speed 2949.50 samples/sec   Loss 6.9078   LearningRate 0.0218   Epoch: 10   Global Step: 132520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:15:40,151-Speed 2950.01 samples/sec   Loss 6.6132   LearningRate 0.0218   Epoch: 10   Global Step: 132530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:15:43,551-Speed 3013.50 samples/sec   Loss 6.7126   LearningRate 0.0218   Epoch: 10   Global Step: 132540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:15:46,954-Speed 3009.93 samples/sec   Loss 6.7243   LearningRate 0.0218   Epoch: 10   Global Step: 132550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:15:50,375-Speed 2994.65 samples/sec   Loss 6.9208   LearningRate 0.0218   Epoch: 10   Global Step: 132560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:15:53,863-Speed 2936.54 samples/sec   Loss 6.7805   LearningRate 0.0217   Epoch: 10   Global Step: 132570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:15:57,229-Speed 3042.92 samples/sec   Loss 6.8814   LearningRate 0.0217   Epoch: 10   Global Step: 132580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:16:00,556-Speed 3078.26 samples/sec   Loss 6.7638   LearningRate 0.0217   Epoch: 10   Global Step: 132590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:16:04,025-Speed 2952.79 samples/sec   Loss 6.8466   LearningRate 0.0217   Epoch: 10   Global Step: 132600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:16:07,472-Speed 2971.45 samples/sec   Loss 6.8567   LearningRate 0.0217   Epoch: 10   Global Step: 132610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:16:10,834-Speed 3046.80 samples/sec   Loss 6.8970   LearningRate 0.0217   Epoch: 10   Global Step: 132620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:16:14,253-Speed 2995.49 samples/sec   Loss 6.8670   LearningRate 0.0217   Epoch: 10   Global Step: 132630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:16:17,627-Speed 3036.52 samples/sec   Loss 6.7376   LearningRate 0.0217   Epoch: 10   Global Step: 132640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:16:21,085-Speed 2961.77 samples/sec   Loss 6.8257   LearningRate 0.0217   Epoch: 10   Global Step: 132650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:16:24,573-Speed 2937.16 samples/sec   Loss 6.7761   LearningRate 0.0217   Epoch: 10   Global Step: 132660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:16:27,948-Speed 3034.42 samples/sec   Loss 6.9045   LearningRate 0.0217   Epoch: 10   Global Step: 132670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:16:31,316-Speed 3041.98 samples/sec   Loss 6.7699   LearningRate 0.0217   Epoch: 10   Global Step: 132680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:16:34,685-Speed 3039.98 samples/sec   Loss 6.7053   LearningRate 0.0217   Epoch: 10   Global Step: 132690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:16:38,076-Speed 3020.75 samples/sec   Loss 6.6820   LearningRate 0.0217   Epoch: 10   Global Step: 132700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:16:41,495-Speed 2995.67 samples/sec   Loss 6.7936   LearningRate 0.0217   Epoch: 10   Global Step: 132710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:16:44,968-Speed 2949.02 samples/sec   Loss 6.7435   LearningRate 0.0217   Epoch: 10   Global Step: 132720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:16:48,401-Speed 2983.57 samples/sec   Loss 6.9441   LearningRate 0.0217   Epoch: 10   Global Step: 132730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:16:51,754-Speed 3055.06 samples/sec   Loss 6.7845   LearningRate 0.0217   Epoch: 10   Global Step: 132740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:16:55,098-Speed 3062.41 samples/sec   Loss 6.6395   LearningRate 0.0217   Epoch: 10   Global Step: 132750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:16:58,436-Speed 3069.24 samples/sec   Loss 6.8296   LearningRate 0.0217   Epoch: 10   Global Step: 132760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:17:01,861-Speed 2990.29 samples/sec   Loss 6.8023   LearningRate 0.0217   Epoch: 10   Global Step: 132770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:17:05,221-Speed 3048.95 samples/sec   Loss 6.7675   LearningRate 0.0217   Epoch: 10   Global Step: 132780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:17:08,633-Speed 3002.18 samples/sec   Loss 6.8558   LearningRate 0.0217   Epoch: 10   Global Step: 132790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:17:12,011-Speed 3031.72 samples/sec   Loss 6.5866   LearningRate 0.0217   Epoch: 10   Global Step: 132800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:17:15,448-Speed 2980.97 samples/sec   Loss 6.8686   LearningRate 0.0217   Epoch: 10   Global Step: 132810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:17:18,840-Speed 3019.70 samples/sec   Loss 6.8016   LearningRate 0.0217   Epoch: 10   Global Step: 132820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:17:22,211-Speed 3038.34 samples/sec   Loss 6.8787   LearningRate 0.0217   Epoch: 10   Global Step: 132830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:17:25,555-Speed 3062.32 samples/sec   Loss 6.7457   LearningRate 0.0216   Epoch: 10   Global Step: 132840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:17:28,898-Speed 3064.24 samples/sec   Loss 6.6790   LearningRate 0.0216   Epoch: 10   Global Step: 132850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:17:32,224-Speed 3079.86 samples/sec   Loss 6.7135   LearningRate 0.0216   Epoch: 10   Global Step: 132860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:17:35,545-Speed 3084.80 samples/sec   Loss 6.7954   LearningRate 0.0216   Epoch: 10   Global Step: 132870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:17:38,918-Speed 3036.69 samples/sec   Loss 6.9508   LearningRate 0.0216   Epoch: 10   Global Step: 132880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:17:42,279-Speed 3047.67 samples/sec   Loss 6.8323   LearningRate 0.0216   Epoch: 10   Global Step: 132890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:17:45,662-Speed 3028.20 samples/sec   Loss 6.8050   LearningRate 0.0216   Epoch: 10   Global Step: 132900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:17:49,064-Speed 3010.99 samples/sec   Loss 6.8051   LearningRate 0.0216   Epoch: 10   Global Step: 132910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:17:52,393-Speed 3077.50 samples/sec   Loss 6.7771   LearningRate 0.0216   Epoch: 10   Global Step: 132920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:17:55,731-Speed 3068.13 samples/sec   Loss 6.7956   LearningRate 0.0216   Epoch: 10   Global Step: 132930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:17:59,173-Speed 2975.99 samples/sec   Loss 6.7846   LearningRate 0.0216   Epoch: 10   Global Step: 132940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:18:02,634-Speed 2958.93 samples/sec   Loss 6.7579   LearningRate 0.0216   Epoch: 10   Global Step: 132950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:18:06,056-Speed 2993.46 samples/sec   Loss 6.9126   LearningRate 0.0216   Epoch: 10   Global Step: 132960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:18:09,415-Speed 3049.84 samples/sec   Loss 6.7026   LearningRate 0.0216   Epoch: 10   Global Step: 132970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:18:12,763-Speed 3059.29 samples/sec   Loss 6.7751   LearningRate 0.0216   Epoch: 10   Global Step: 132980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:18:16,150-Speed 3024.38 samples/sec   Loss 6.8485   LearningRate 0.0216   Epoch: 10   Global Step: 132990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:18:19,587-Speed 2980.66 samples/sec   Loss 6.7998   LearningRate 0.0216   Epoch: 10   Global Step: 133000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:18:22,886-Speed 3104.93 samples/sec   Loss 6.7967   LearningRate 0.0216   Epoch: 10   Global Step: 133010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:18:26,296-Speed 3003.15 samples/sec   Loss 6.7371   LearningRate 0.0216   Epoch: 10   Global Step: 133020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:18:29,673-Speed 3033.67 samples/sec   Loss 6.8872   LearningRate 0.0216   Epoch: 10   Global Step: 133030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:18:33,089-Speed 2998.37 samples/sec   Loss 6.8142   LearningRate 0.0216   Epoch: 10   Global Step: 133040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:18:36,566-Speed 2946.05 samples/sec   Loss 6.9085   LearningRate 0.0216   Epoch: 10   Global Step: 133050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:18:39,968-Speed 3010.62 samples/sec   Loss 6.7552   LearningRate 0.0216   Epoch: 10   Global Step: 133060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:18:43,347-Speed 3031.82 samples/sec   Loss 6.8331   LearningRate 0.0216   Epoch: 10   Global Step: 133070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:18:46,712-Speed 3043.91 samples/sec   Loss 6.7902   LearningRate 0.0216   Epoch: 10   Global Step: 133080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:18:50,153-Speed 2976.78 samples/sec   Loss 6.7343   LearningRate 0.0216   Epoch: 10   Global Step: 133090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:18:53,503-Speed 3057.21 samples/sec   Loss 6.8648   LearningRate 0.0215   Epoch: 10   Global Step: 133100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:18:56,885-Speed 3029.06 samples/sec   Loss 6.7655   LearningRate 0.0215   Epoch: 10   Global Step: 133110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:19:00,294-Speed 3004.68 samples/sec   Loss 6.7791   LearningRate 0.0215   Epoch: 10   Global Step: 133120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:19:03,760-Speed 2954.75 samples/sec   Loss 6.8485   LearningRate 0.0215   Epoch: 10   Global Step: 133130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:19:07,093-Speed 3073.16 samples/sec   Loss 6.9290   LearningRate 0.0215   Epoch: 10   Global Step: 133140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:19:10,466-Speed 3036.65 samples/sec   Loss 6.8010   LearningRate 0.0215   Epoch: 10   Global Step: 133150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:19:13,794-Speed 3078.24 samples/sec   Loss 6.7750   LearningRate 0.0215   Epoch: 10   Global Step: 133160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:19:17,163-Speed 3039.45 samples/sec   Loss 6.7536   LearningRate 0.0215   Epoch: 10   Global Step: 133170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:19:20,625-Speed 2960.18 samples/sec   Loss 6.6756   LearningRate 0.0215   Epoch: 10   Global Step: 133180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:19:24,083-Speed 2961.72 samples/sec   Loss 6.8254   LearningRate 0.0215   Epoch: 10   Global Step: 133190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:19:27,554-Speed 2950.64 samples/sec   Loss 6.8378   LearningRate 0.0215   Epoch: 10   Global Step: 133200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:19:30,987-Speed 2983.72 samples/sec   Loss 6.9121   LearningRate 0.0215   Epoch: 10   Global Step: 133210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:19:34,486-Speed 2927.47 samples/sec   Loss 6.8371   LearningRate 0.0215   Epoch: 10   Global Step: 133220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:19:37,941-Speed 2964.87 samples/sec   Loss 6.7192   LearningRate 0.0215   Epoch: 10   Global Step: 133230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:19:41,456-Speed 2913.87 samples/sec   Loss 6.8181   LearningRate 0.0215   Epoch: 10   Global Step: 133240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:19:44,848-Speed 3019.30 samples/sec   Loss 6.7540   LearningRate 0.0215   Epoch: 10   Global Step: 133250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:19:48,282-Speed 2983.49 samples/sec   Loss 6.7862   LearningRate 0.0215   Epoch: 10   Global Step: 133260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:19:51,633-Speed 3056.06 samples/sec   Loss 6.7479   LearningRate 0.0215   Epoch: 10   Global Step: 133270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:19:55,055-Speed 2993.83 samples/sec   Loss 6.7427   LearningRate 0.0215   Epoch: 10   Global Step: 133280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:19:58,531-Speed 2946.76 samples/sec   Loss 6.6397   LearningRate 0.0215   Epoch: 10   Global Step: 133290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:20:01,864-Speed 3073.52 samples/sec   Loss 6.8089   LearningRate 0.0215   Epoch: 10   Global Step: 133300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:20:05,317-Speed 2965.35 samples/sec   Loss 6.7641   LearningRate 0.0215   Epoch: 10   Global Step: 133310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:20:08,751-Speed 2983.47 samples/sec   Loss 6.6654   LearningRate 0.0215   Epoch: 10   Global Step: 133320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:20:12,255-Speed 2923.08 samples/sec   Loss 6.8342   LearningRate 0.0215   Epoch: 10   Global Step: 133330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:20:15,653-Speed 3014.05 samples/sec   Loss 6.7995   LearningRate 0.0215   Epoch: 10   Global Step: 133340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:20:19,100-Speed 2971.88 samples/sec   Loss 6.7568   LearningRate 0.0215   Epoch: 10   Global Step: 133350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:20:22,519-Speed 2995.60 samples/sec   Loss 6.7418   LearningRate 0.0215   Epoch: 10   Global Step: 133360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:20:25,902-Speed 3027.65 samples/sec   Loss 6.7533   LearningRate 0.0214   Epoch: 10   Global Step: 133370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:20:29,235-Speed 3073.44 samples/sec   Loss 6.8518   LearningRate 0.0214   Epoch: 10   Global Step: 133380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:20:32,566-Speed 3075.58 samples/sec   Loss 6.8656   LearningRate 0.0214   Epoch: 10   Global Step: 133390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:20:35,962-Speed 3015.87 samples/sec   Loss 6.7379   LearningRate 0.0214   Epoch: 10   Global Step: 133400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:20:39,347-Speed 3025.87 samples/sec   Loss 6.7517   LearningRate 0.0214   Epoch: 10   Global Step: 133410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:20:42,759-Speed 3002.41 samples/sec   Loss 6.8646   LearningRate 0.0214   Epoch: 10   Global Step: 133420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:20:46,168-Speed 3004.80 samples/sec   Loss 6.7334   LearningRate 0.0214   Epoch: 10   Global Step: 133430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:20:49,543-Speed 3035.53 samples/sec   Loss 6.7270   LearningRate 0.0214   Epoch: 10   Global Step: 133440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:20:52,941-Speed 3014.58 samples/sec   Loss 6.7182   LearningRate 0.0214   Epoch: 10   Global Step: 133450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:20:56,345-Speed 3008.50 samples/sec   Loss 6.6999   LearningRate 0.0214   Epoch: 10   Global Step: 133460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:20:59,778-Speed 2983.99 samples/sec   Loss 6.7477   LearningRate 0.0214   Epoch: 10   Global Step: 133470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:21:03,199-Speed 2993.93 samples/sec   Loss 6.6417   LearningRate 0.0214   Epoch: 10   Global Step: 133480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:21:06,567-Speed 3041.74 samples/sec   Loss 6.8403   LearningRate 0.0214   Epoch: 10   Global Step: 133490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:21:09,991-Speed 2991.39 samples/sec   Loss 6.7201   LearningRate 0.0214   Epoch: 10   Global Step: 133500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:21:13,411-Speed 2994.75 samples/sec   Loss 6.7271   LearningRate 0.0214   Epoch: 10   Global Step: 133510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:21:16,827-Speed 2998.64 samples/sec   Loss 6.6957   LearningRate 0.0214   Epoch: 10   Global Step: 133520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:21:20,180-Speed 3055.12 samples/sec   Loss 6.8281   LearningRate 0.0214   Epoch: 10   Global Step: 133530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:21:23,595-Speed 2998.67 samples/sec   Loss 6.7249   LearningRate 0.0214   Epoch: 10   Global Step: 133540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:21:26,990-Speed 3018.66 samples/sec   Loss 6.7928   LearningRate 0.0214   Epoch: 10   Global Step: 133550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:21:30,356-Speed 3042.97 samples/sec   Loss 6.7263   LearningRate 0.0214   Epoch: 10   Global Step: 133560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:21:33,743-Speed 3024.47 samples/sec   Loss 6.9165   LearningRate 0.0214   Epoch: 10   Global Step: 133570   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:21:37,049-Speed 3098.02 samples/sec   Loss 6.7995   LearningRate 0.0214   Epoch: 10   Global Step: 133580   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:21:40,377-Speed 3077.85 samples/sec   Loss 6.8005   LearningRate 0.0214   Epoch: 10   Global Step: 133590   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:21:43,726-Speed 3058.37 samples/sec   Loss 6.6201   LearningRate 0.0214   Epoch: 10   Global Step: 133600   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:21:47,116-Speed 3021.01 samples/sec   Loss 6.7592   LearningRate 0.0214   Epoch: 10   Global Step: 133610   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:21:50,444-Speed 3078.39 samples/sec   Loss 6.7908   LearningRate 0.0214   Epoch: 10   Global Step: 133620   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:21:53,814-Speed 3039.37 samples/sec   Loss 6.7717   LearningRate 0.0214   Epoch: 10   Global Step: 133630   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:21:57,122-Speed 3095.97 samples/sec   Loss 6.6650   LearningRate 0.0213   Epoch: 10   Global Step: 133640   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:22:00,470-Speed 3059.42 samples/sec   Loss 6.7934   LearningRate 0.0213   Epoch: 10   Global Step: 133650   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:22:03,876-Speed 3007.77 samples/sec   Loss 6.7330   LearningRate 0.0213   Epoch: 10   Global Step: 133660   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 14:22:07,207-Speed 3075.02 samples/sec   Loss 6.8248   LearningRate 0.0213   Epoch: 10   Global Step: 133670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:22:10,541-Speed 3071.68 samples/sec   Loss 6.7906   LearningRate 0.0213   Epoch: 10   Global Step: 133680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:22:13,954-Speed 3001.53 samples/sec   Loss 6.7933   LearningRate 0.0213   Epoch: 10   Global Step: 133690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:22:17,288-Speed 3071.74 samples/sec   Loss 6.8587   LearningRate 0.0213   Epoch: 10   Global Step: 133700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:22:20,642-Speed 3054.60 samples/sec   Loss 6.7616   LearningRate 0.0213   Epoch: 10   Global Step: 133710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:22:23,997-Speed 3052.56 samples/sec   Loss 6.7928   LearningRate 0.0213   Epoch: 10   Global Step: 133720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:22:27,404-Speed 3006.33 samples/sec   Loss 6.7850   LearningRate 0.0213   Epoch: 10   Global Step: 133730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:22:30,815-Speed 3002.86 samples/sec   Loss 6.7900   LearningRate 0.0213   Epoch: 10   Global Step: 133740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:22:34,146-Speed 3075.55 samples/sec   Loss 6.6933   LearningRate 0.0213   Epoch: 10   Global Step: 133750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:22:37,484-Speed 3069.00 samples/sec   Loss 6.7466   LearningRate 0.0213   Epoch: 10   Global Step: 133760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:22:40,792-Speed 3095.82 samples/sec   Loss 6.7277   LearningRate 0.0213   Epoch: 10   Global Step: 133770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:22:44,137-Speed 3062.63 samples/sec   Loss 6.8653   LearningRate 0.0213   Epoch: 10   Global Step: 133780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:22:47,458-Speed 3084.48 samples/sec   Loss 6.6711   LearningRate 0.0213   Epoch: 10   Global Step: 133790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:22:50,822-Speed 3044.84 samples/sec   Loss 6.7480   LearningRate 0.0213   Epoch: 10   Global Step: 133800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:22:54,227-Speed 3007.51 samples/sec   Loss 6.7072   LearningRate 0.0213   Epoch: 10   Global Step: 133810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:22:57,633-Speed 3007.19 samples/sec   Loss 6.7265   LearningRate 0.0213   Epoch: 10   Global Step: 133820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:23:01,004-Speed 3038.60 samples/sec   Loss 6.7318   LearningRate 0.0213   Epoch: 10   Global Step: 133830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:23:04,471-Speed 2955.01 samples/sec   Loss 6.8288   LearningRate 0.0213   Epoch: 10   Global Step: 133840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:23:07,799-Speed 3077.69 samples/sec   Loss 6.6965   LearningRate 0.0213   Epoch: 10   Global Step: 133850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:23:11,202-Speed 3009.91 samples/sec   Loss 6.6940   LearningRate 0.0213   Epoch: 10   Global Step: 133860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:23:14,602-Speed 3012.79 samples/sec   Loss 6.6568   LearningRate 0.0213   Epoch: 10   Global Step: 133870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:23:17,969-Speed 3041.52 samples/sec   Loss 6.6355   LearningRate 0.0213   Epoch: 10   Global Step: 133880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:23:21,326-Speed 3051.22 samples/sec   Loss 6.7488   LearningRate 0.0213   Epoch: 10   Global Step: 133890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:23:24,660-Speed 3072.49 samples/sec   Loss 6.7121   LearningRate 0.0213   Epoch: 10   Global Step: 133900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:23:28,126-Speed 2955.10 samples/sec   Loss 6.7785   LearningRate 0.0212   Epoch: 10   Global Step: 133910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:23:31,594-Speed 2953.32 samples/sec   Loss 6.6387   LearningRate 0.0212   Epoch: 10   Global Step: 133920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:23:34,928-Speed 3073.11 samples/sec   Loss 6.7711   LearningRate 0.0212   Epoch: 10   Global Step: 133930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:23:38,322-Speed 3017.38 samples/sec   Loss 6.7839   LearningRate 0.0212   Epoch: 10   Global Step: 133940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:23:41,724-Speed 3011.00 samples/sec   Loss 6.7505   LearningRate 0.0212   Epoch: 10   Global Step: 133950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:23:45,140-Speed 2998.83 samples/sec   Loss 6.7474   LearningRate 0.0212   Epoch: 10   Global Step: 133960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:23:48,518-Speed 3032.18 samples/sec   Loss 6.7012   LearningRate 0.0212   Epoch: 10   Global Step: 133970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:23:51,899-Speed 3029.53 samples/sec   Loss 6.8335   LearningRate 0.0212   Epoch: 10   Global Step: 133980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:23:55,244-Speed 3062.70 samples/sec   Loss 6.7666   LearningRate 0.0212   Epoch: 10   Global Step: 133990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:23:58,636-Speed 3019.18 samples/sec   Loss 6.7985   LearningRate 0.0212   Epoch: 10   Global Step: 134000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:24:01,974-Speed 3069.52 samples/sec   Loss 6.5268   LearningRate 0.0212   Epoch: 10   Global Step: 134010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:24:05,394-Speed 2994.84 samples/sec   Loss 6.7400   LearningRate 0.0212   Epoch: 10   Global Step: 134020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:24:08,774-Speed 3030.57 samples/sec   Loss 6.8113   LearningRate 0.0212   Epoch: 10   Global Step: 134030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:24:12,233-Speed 2961.43 samples/sec   Loss 6.6668   LearningRate 0.0212   Epoch: 10   Global Step: 134040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:24:15,599-Speed 3042.81 samples/sec   Loss 6.7498   LearningRate 0.0212   Epoch: 10   Global Step: 134050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:24:18,940-Speed 3065.54 samples/sec   Loss 6.8575   LearningRate 0.0212   Epoch: 10   Global Step: 134060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:24:22,370-Speed 2986.60 samples/sec   Loss 6.8121   LearningRate 0.0212   Epoch: 10   Global Step: 134070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:24:25,693-Speed 3082.57 samples/sec   Loss 6.6543   LearningRate 0.0212   Epoch: 10   Global Step: 134080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:24:29,142-Speed 2969.21 samples/sec   Loss 6.7160   LearningRate 0.0212   Epoch: 10   Global Step: 134090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:24:32,496-Speed 3054.53 samples/sec   Loss 6.7653   LearningRate 0.0212   Epoch: 10   Global Step: 134100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:24:35,822-Speed 3079.57 samples/sec   Loss 6.6455   LearningRate 0.0212   Epoch: 10   Global Step: 134110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:24:39,181-Speed 3049.08 samples/sec   Loss 6.8411   LearningRate 0.0212   Epoch: 10   Global Step: 134120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:24:42,596-Speed 2999.21 samples/sec   Loss 6.6833   LearningRate 0.0212   Epoch: 10   Global Step: 134130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:24:45,910-Speed 3090.74 samples/sec   Loss 6.7158   LearningRate 0.0212   Epoch: 10   Global Step: 134140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:24:49,264-Speed 3054.14 samples/sec   Loss 6.8121   LearningRate 0.0212   Epoch: 10   Global Step: 134150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:24:52,676-Speed 3001.63 samples/sec   Loss 6.7191   LearningRate 0.0212   Epoch: 10   Global Step: 134160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:24:56,057-Speed 3029.87 samples/sec   Loss 6.6490   LearningRate 0.0212   Epoch: 10   Global Step: 134170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:24:59,487-Speed 2985.71 samples/sec   Loss 6.7120   LearningRate 0.0211   Epoch: 10   Global Step: 134180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:02,889-Speed 3011.29 samples/sec   Loss 6.6937   LearningRate 0.0211   Epoch: 10   Global Step: 134190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:06,234-Speed 3061.91 samples/sec   Loss 6.7751   LearningRate 0.0211   Epoch: 10   Global Step: 134200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:09,661-Speed 2988.87 samples/sec   Loss 6.7763   LearningRate 0.0211   Epoch: 10   Global Step: 134210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:13,137-Speed 2946.65 samples/sec   Loss 6.7720   LearningRate 0.0211   Epoch: 10   Global Step: 134220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:16,584-Speed 2971.28 samples/sec   Loss 6.7289   LearningRate 0.0211   Epoch: 10   Global Step: 134230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:20,020-Speed 2981.69 samples/sec   Loss 6.8570   LearningRate 0.0211   Epoch: 10   Global Step: 134240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:23,519-Speed 2926.92 samples/sec   Loss 6.6679   LearningRate 0.0211   Epoch: 10   Global Step: 134250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:26,869-Speed 3057.63 samples/sec   Loss 6.7733   LearningRate 0.0211   Epoch: 10   Global Step: 134260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:30,357-Speed 2936.75 samples/sec   Loss 6.6509   LearningRate 0.0211   Epoch: 10   Global Step: 134270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:33,705-Speed 3060.01 samples/sec   Loss 6.7582   LearningRate 0.0211   Epoch: 10   Global Step: 134280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:37,110-Speed 3007.33 samples/sec   Loss 6.7082   LearningRate 0.0211   Epoch: 10   Global Step: 134290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:40,458-Speed 3059.65 samples/sec   Loss 6.6552   LearningRate 0.0211   Epoch: 10   Global Step: 134300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:43,789-Speed 3075.02 samples/sec   Loss 6.7575   LearningRate 0.0211   Epoch: 10   Global Step: 134310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:47,150-Speed 3047.93 samples/sec   Loss 6.8296   LearningRate 0.0211   Epoch: 10   Global Step: 134320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:50,490-Speed 3066.43 samples/sec   Loss 6.6745   LearningRate 0.0211   Epoch: 10   Global Step: 134330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:53,931-Speed 2979.98 samples/sec   Loss 6.6452   LearningRate 0.0211   Epoch: 10   Global Step: 134340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:25:57,400-Speed 2951.91 samples/sec   Loss 6.7850   LearningRate 0.0211   Epoch: 10   Global Step: 134350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:26:00,867-Speed 2954.34 samples/sec   Loss 6.7389   LearningRate 0.0211   Epoch: 10   Global Step: 134360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:26:04,225-Speed 3050.63 samples/sec   Loss 6.7091   LearningRate 0.0211   Epoch: 10   Global Step: 134370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:26:07,587-Speed 3046.34 samples/sec   Loss 6.7090   LearningRate 0.0211   Epoch: 10   Global Step: 134380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:26:10,993-Speed 3007.28 samples/sec   Loss 6.7836   LearningRate 0.0211   Epoch: 10   Global Step: 134390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:26:14,366-Speed 3037.09 samples/sec   Loss 6.8215   LearningRate 0.0211   Epoch: 10   Global Step: 134400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:26:17,768-Speed 3010.77 samples/sec   Loss 6.6070   LearningRate 0.0211   Epoch: 10   Global Step: 134410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:26:21,210-Speed 2975.89 samples/sec   Loss 6.6827   LearningRate 0.0211   Epoch: 10   Global Step: 134420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:26:24,524-Speed 3091.28 samples/sec   Loss 6.7899   LearningRate 0.0211   Epoch: 10   Global Step: 134430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:26:27,867-Speed 3063.80 samples/sec   Loss 6.6656   LearningRate 0.0211   Epoch: 10   Global Step: 134440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:26:31,220-Speed 3055.60 samples/sec   Loss 6.6813   LearningRate 0.0210   Epoch: 10   Global Step: 134450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:26:34,540-Speed 3084.42 samples/sec   Loss 6.6692   LearningRate 0.0210   Epoch: 10   Global Step: 134460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:26:37,925-Speed 3025.92 samples/sec   Loss 6.8893   LearningRate 0.0210   Epoch: 10   Global Step: 134470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:26:41,265-Speed 3067.26 samples/sec   Loss 6.8178   LearningRate 0.0210   Epoch: 10   Global Step: 134480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:26:44,641-Speed 3033.76 samples/sec   Loss 6.7368   LearningRate 0.0210   Epoch: 10   Global Step: 134490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:26:48,089-Speed 2970.95 samples/sec   Loss 6.6239   LearningRate 0.0210   Epoch: 10   Global Step: 134500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:26:51,448-Speed 3049.66 samples/sec   Loss 6.8047   LearningRate 0.0210   Epoch: 10   Global Step: 134510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:26:54,848-Speed 3012.64 samples/sec   Loss 6.6042   LearningRate 0.0210   Epoch: 10   Global Step: 134520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:26:58,180-Speed 3073.57 samples/sec   Loss 6.5434   LearningRate 0.0210   Epoch: 10   Global Step: 134530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:27:01,506-Speed 3080.25 samples/sec   Loss 6.7921   LearningRate 0.0210   Epoch: 10   Global Step: 134540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:27:04,913-Speed 3006.40 samples/sec   Loss 6.6585   LearningRate 0.0210   Epoch: 10   Global Step: 134550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:27:08,254-Speed 3065.50 samples/sec   Loss 6.8075   LearningRate 0.0210   Epoch: 10   Global Step: 134560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:27:11,680-Speed 2990.18 samples/sec   Loss 6.5615   LearningRate 0.0210   Epoch: 10   Global Step: 134570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:27:15,073-Speed 3018.62 samples/sec   Loss 6.6299   LearningRate 0.0210   Epoch: 10   Global Step: 134580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:27:18,433-Speed 3048.71 samples/sec   Loss 6.6199   LearningRate 0.0210   Epoch: 10   Global Step: 134590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:27:21,895-Speed 2958.15 samples/sec   Loss 6.7299   LearningRate 0.0210   Epoch: 10   Global Step: 134600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:27:25,289-Speed 3018.30 samples/sec   Loss 6.7011   LearningRate 0.0210   Epoch: 10   Global Step: 134610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:27:28,636-Speed 3060.72 samples/sec   Loss 6.5530   LearningRate 0.0210   Epoch: 10   Global Step: 134620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:27:32,039-Speed 3010.04 samples/sec   Loss 6.7532   LearningRate 0.0210   Epoch: 10   Global Step: 134630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:27:35,519-Speed 2943.28 samples/sec   Loss 6.7940   LearningRate 0.0210   Epoch: 10   Global Step: 134640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:27:38,989-Speed 2951.49 samples/sec   Loss 6.6959   LearningRate 0.0210   Epoch: 10   Global Step: 134650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:27:42,405-Speed 2998.60 samples/sec   Loss 6.6255   LearningRate 0.0210   Epoch: 10   Global Step: 134660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:27:45,802-Speed 3014.64 samples/sec   Loss 6.7328   LearningRate 0.0210   Epoch: 10   Global Step: 134670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:27:49,123-Speed 3084.97 samples/sec   Loss 6.7336   LearningRate 0.0210   Epoch: 10   Global Step: 134680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:27:52,498-Speed 3034.94 samples/sec   Loss 6.5865   LearningRate 0.0210   Epoch: 10   Global Step: 134690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:27:55,902-Speed 3008.56 samples/sec   Loss 6.6865   LearningRate 0.0210   Epoch: 10   Global Step: 134700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:27:59,285-Speed 3027.87 samples/sec   Loss 6.7482   LearningRate 0.0210   Epoch: 10   Global Step: 134710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:28:02,683-Speed 3014.51 samples/sec   Loss 6.5904   LearningRate 0.0209   Epoch: 10   Global Step: 134720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:28:06,171-Speed 2936.34 samples/sec   Loss 6.6960   LearningRate 0.0209   Epoch: 10   Global Step: 134730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:28:09,558-Speed 3024.21 samples/sec   Loss 6.7815   LearningRate 0.0209   Epoch: 10   Global Step: 134740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:28:12,924-Speed 3042.93 samples/sec   Loss 6.7065   LearningRate 0.0209   Epoch: 10   Global Step: 134750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:28:16,238-Speed 3090.62 samples/sec   Loss 6.7258   LearningRate 0.0209   Epoch: 10   Global Step: 134760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:28:19,644-Speed 3007.71 samples/sec   Loss 6.5918   LearningRate 0.0209   Epoch: 10   Global Step: 134770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:28:23,014-Speed 3039.95 samples/sec   Loss 6.6348   LearningRate 0.0209   Epoch: 10   Global Step: 134780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:28:26,364-Speed 3057.46 samples/sec   Loss 6.7151   LearningRate 0.0209   Epoch: 10   Global Step: 134790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:28:29,802-Speed 2979.15 samples/sec   Loss 6.5646   LearningRate 0.0209   Epoch: 10   Global Step: 134800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:28:33,192-Speed 3021.78 samples/sec   Loss 6.7168   LearningRate 0.0209   Epoch: 10   Global Step: 134810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:28:36,590-Speed 3014.69 samples/sec   Loss 6.8067   LearningRate 0.0209   Epoch: 10   Global Step: 134820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:28:39,981-Speed 3020.53 samples/sec   Loss 6.6760   LearningRate 0.0209   Epoch: 10   Global Step: 134830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:28:43,377-Speed 3016.37 samples/sec   Loss 6.7691   LearningRate 0.0209   Epoch: 10   Global Step: 134840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:28:46,766-Speed 3022.70 samples/sec   Loss 6.7517   LearningRate 0.0209   Epoch: 10   Global Step: 134850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:28:50,110-Speed 3063.92 samples/sec   Loss 6.6255   LearningRate 0.0209   Epoch: 10   Global Step: 134860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:28:53,466-Speed 3052.32 samples/sec   Loss 6.7299   LearningRate 0.0209   Epoch: 10   Global Step: 134870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:28:56,846-Speed 3030.08 samples/sec   Loss 6.8155   LearningRate 0.0209   Epoch: 10   Global Step: 134880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:29:00,275-Speed 2986.87 samples/sec   Loss 6.7421   LearningRate 0.0209   Epoch: 10   Global Step: 134890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:29:03,696-Speed 2994.60 samples/sec   Loss 6.7622   LearningRate 0.0209   Epoch: 10   Global Step: 134900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:29:07,190-Speed 2931.39 samples/sec   Loss 6.6213   LearningRate 0.0209   Epoch: 10   Global Step: 134910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:29:10,588-Speed 3014.42 samples/sec   Loss 6.6378   LearningRate 0.0209   Epoch: 10   Global Step: 134920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:29:14,022-Speed 2982.58 samples/sec   Loss 6.5786   LearningRate 0.0209   Epoch: 10   Global Step: 134930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:29:17,491-Speed 2953.00 samples/sec   Loss 6.6889   LearningRate 0.0209   Epoch: 10   Global Step: 134940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:29:20,921-Speed 2986.30 samples/sec   Loss 6.8094   LearningRate 0.0209   Epoch: 10   Global Step: 134950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:29:24,352-Speed 2985.85 samples/sec   Loss 6.5753   LearningRate 0.0209   Epoch: 10   Global Step: 134960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:29:27,746-Speed 3017.16 samples/sec   Loss 6.6239   LearningRate 0.0209   Epoch: 10   Global Step: 134970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:29:31,159-Speed 3001.50 samples/sec   Loss 6.6525   LearningRate 0.0209   Epoch: 10   Global Step: 134980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:29:34,545-Speed 3024.97 samples/sec   Loss 6.7705   LearningRate 0.0208   Epoch: 10   Global Step: 134990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:29:37,907-Speed 3047.01 samples/sec   Loss 6.7116   LearningRate 0.0208   Epoch: 10   Global Step: 135000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:29:41,357-Speed 2969.44 samples/sec   Loss 6.6151   LearningRate 0.0208   Epoch: 10   Global Step: 135010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:29:44,707-Speed 3057.04 samples/sec   Loss 6.6264   LearningRate 0.0208   Epoch: 10   Global Step: 135020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:29:48,206-Speed 2927.48 samples/sec   Loss 6.8352   LearningRate 0.0208   Epoch: 10   Global Step: 135030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:29:51,587-Speed 3030.19 samples/sec   Loss 6.6125   LearningRate 0.0208   Epoch: 10   Global Step: 135040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:29:54,998-Speed 3003.00 samples/sec   Loss 6.7222   LearningRate 0.0208   Epoch: 10   Global Step: 135050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:29:58,353-Speed 3052.16 samples/sec   Loss 6.6792   LearningRate 0.0208   Epoch: 10   Global Step: 135060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:30:01,722-Speed 3040.27 samples/sec   Loss 6.6919   LearningRate 0.0208   Epoch: 10   Global Step: 135070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 14:30:05,102-Speed 3030.80 samples/sec   Loss 6.5576   LearningRate 0.0208   Epoch: 10   Global Step: 135080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:30:08,489-Speed 3024.05 samples/sec   Loss 6.7027   LearningRate 0.0208   Epoch: 10   Global Step: 135090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:30:11,865-Speed 3033.91 samples/sec   Loss 6.6898   LearningRate 0.0208   Epoch: 10   Global Step: 135100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:30:15,276-Speed 3002.91 samples/sec   Loss 6.7408   LearningRate 0.0208   Epoch: 10   Global Step: 135110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:30:18,637-Speed 3048.19 samples/sec   Loss 6.6598   LearningRate 0.0208   Epoch: 10   Global Step: 135120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:30:22,092-Speed 2964.65 samples/sec   Loss 6.7730   LearningRate 0.0208   Epoch: 10   Global Step: 135130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:30:25,497-Speed 3007.52 samples/sec   Loss 6.6537   LearningRate 0.0208   Epoch: 10   Global Step: 135140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:30:28,928-Speed 2985.76 samples/sec   Loss 6.6039   LearningRate 0.0208   Epoch: 10   Global Step: 135150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:30:32,255-Speed 3078.43 samples/sec   Loss 6.7908   LearningRate 0.0208   Epoch: 10   Global Step: 135160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:30:35,674-Speed 2996.12 samples/sec   Loss 6.6602   LearningRate 0.0208   Epoch: 10   Global Step: 135170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:30:39,039-Speed 3044.19 samples/sec   Loss 6.7594   LearningRate 0.0208   Epoch: 10   Global Step: 135180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:30:42,378-Speed 3066.99 samples/sec   Loss 6.7041   LearningRate 0.0208   Epoch: 10   Global Step: 135190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:30:45,787-Speed 3004.71 samples/sec   Loss 6.7140   LearningRate 0.0208   Epoch: 10   Global Step: 135200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:30:49,165-Speed 3032.59 samples/sec   Loss 6.6797   LearningRate 0.0208   Epoch: 10   Global Step: 135210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:30:52,635-Speed 2951.62 samples/sec   Loss 6.7401   LearningRate 0.0208   Epoch: 10   Global Step: 135220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:30:56,002-Speed 3042.07 samples/sec   Loss 6.6568   LearningRate 0.0208   Epoch: 10   Global Step: 135230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:30:59,398-Speed 3016.31 samples/sec   Loss 6.7653   LearningRate 0.0208   Epoch: 10   Global Step: 135240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:02,767-Speed 3040.42 samples/sec   Loss 6.6693   LearningRate 0.0208   Epoch: 10   Global Step: 135250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:06,112-Speed 3061.80 samples/sec   Loss 6.7425   LearningRate 0.0207   Epoch: 10   Global Step: 135260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:09,530-Speed 2997.40 samples/sec   Loss 6.6140   LearningRate 0.0207   Epoch: 10   Global Step: 135270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:12,991-Speed 2958.92 samples/sec   Loss 6.5934   LearningRate 0.0207   Epoch: 10   Global Step: 135280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:16,468-Speed 2946.19 samples/sec   Loss 6.7520   LearningRate 0.0207   Epoch: 10   Global Step: 135290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:19,839-Speed 3038.29 samples/sec   Loss 6.6525   LearningRate 0.0207   Epoch: 10   Global Step: 135300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:23,200-Speed 3047.71 samples/sec   Loss 6.7652   LearningRate 0.0207   Epoch: 10   Global Step: 135310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:26,591-Speed 3020.56 samples/sec   Loss 6.6618   LearningRate 0.0207   Epoch: 10   Global Step: 135320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:29,951-Speed 3048.71 samples/sec   Loss 6.6387   LearningRate 0.0207   Epoch: 10   Global Step: 135330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:33,346-Speed 3017.06 samples/sec   Loss 6.5680   LearningRate 0.0207   Epoch: 10   Global Step: 135340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 14:31:36,757-Speed 3003.15 samples/sec   Loss 6.7150   LearningRate 0.0207   Epoch: 10   Global Step: 135350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:40,258-Speed 2925.20 samples/sec   Loss 6.5999   LearningRate 0.0207   Epoch: 10   Global Step: 135360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:43,652-Speed 3017.63 samples/sec   Loss 6.6921   LearningRate 0.0207   Epoch: 10   Global Step: 135370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:46,962-Speed 3095.14 samples/sec   Loss 6.7189   LearningRate 0.0207   Epoch: 10   Global Step: 135380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:50,285-Speed 3083.13 samples/sec   Loss 6.6668   LearningRate 0.0207   Epoch: 10   Global Step: 135390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:53,714-Speed 2987.04 samples/sec   Loss 6.6441   LearningRate 0.0207   Epoch: 10   Global Step: 135400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:31:57,064-Speed 3057.52 samples/sec   Loss 6.6602   LearningRate 0.0207   Epoch: 10   Global Step: 135410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:32:00,454-Speed 3022.12 samples/sec   Loss 6.5631   LearningRate 0.0207   Epoch: 10   Global Step: 135420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:32:03,813-Speed 3049.28 samples/sec   Loss 6.6409   LearningRate 0.0207   Epoch: 10   Global Step: 135430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:32:07,165-Speed 3055.65 samples/sec   Loss 6.5378   LearningRate 0.0207   Epoch: 10   Global Step: 135440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:32:10,550-Speed 3025.56 samples/sec   Loss 6.7514   LearningRate 0.0207   Epoch: 10   Global Step: 135450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:32:13,907-Speed 3051.45 samples/sec   Loss 6.6708   LearningRate 0.0207   Epoch: 10   Global Step: 135460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:32:17,269-Speed 3045.93 samples/sec   Loss 6.6109   LearningRate 0.0207   Epoch: 10   Global Step: 135470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:32:20,611-Speed 3066.12 samples/sec   Loss 6.6637   LearningRate 0.0207   Epoch: 10   Global Step: 135480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:32:23,944-Speed 3073.04 samples/sec   Loss 6.6131   LearningRate 0.0207   Epoch: 10   Global Step: 135490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:32:27,313-Speed 3040.17 samples/sec   Loss 6.6753   LearningRate 0.0207   Epoch: 10   Global Step: 135500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:32:30,731-Speed 2997.02 samples/sec   Loss 6.7069   LearningRate 0.0207   Epoch: 10   Global Step: 135510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:32:34,093-Speed 3047.09 samples/sec   Loss 6.5888   LearningRate 0.0207   Epoch: 10   Global Step: 135520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:32:37,455-Speed 3046.06 samples/sec   Loss 6.6052   LearningRate 0.0207   Epoch: 10   Global Step: 135530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:32:40,812-Speed 3051.92 samples/sec   Loss 6.7051   LearningRate 0.0206   Epoch: 10   Global Step: 135540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:32:44,217-Speed 3007.99 samples/sec   Loss 6.6214   LearningRate 0.0206   Epoch: 10   Global Step: 135550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:32:47,558-Speed 3065.66 samples/sec   Loss 6.5986   LearningRate 0.0206   Epoch: 10   Global Step: 135560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:32:50,896-Speed 3068.96 samples/sec   Loss 6.6786   LearningRate 0.0206   Epoch: 10   Global Step: 135570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:32:54,221-Speed 3080.41 samples/sec   Loss 6.6517   LearningRate 0.0206   Epoch: 10   Global Step: 135580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:32:57,536-Speed 3089.66 samples/sec   Loss 6.5678   LearningRate 0.0206   Epoch: 10   Global Step: 135590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:33:00,894-Speed 3050.51 samples/sec   Loss 6.6402   LearningRate 0.0206   Epoch: 10   Global Step: 135600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:33:04,186-Speed 3111.73 samples/sec   Loss 6.6649   LearningRate 0.0206   Epoch: 10   Global Step: 135610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:33:07,553-Speed 3042.07 samples/sec   Loss 6.6594   LearningRate 0.0206   Epoch: 10   Global Step: 135620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:33:10,893-Speed 3066.85 samples/sec   Loss 6.5700   LearningRate 0.0206   Epoch: 10   Global Step: 135630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:33:14,237-Speed 3062.73 samples/sec   Loss 6.4951   LearningRate 0.0206   Epoch: 10   Global Step: 135640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:33:17,588-Speed 3056.90 samples/sec   Loss 6.7209   LearningRate 0.0206   Epoch: 10   Global Step: 135650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:33:20,979-Speed 3020.36 samples/sec   Loss 6.6136   LearningRate 0.0206   Epoch: 10   Global Step: 135660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:33:24,369-Speed 3023.27 samples/sec   Loss 6.6705   LearningRate 0.0206   Epoch: 10   Global Step: 135670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:33:27,737-Speed 3041.12 samples/sec   Loss 6.6037   LearningRate 0.0206   Epoch: 10   Global Step: 135680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:33:31,059-Speed 3083.17 samples/sec   Loss 6.6639   LearningRate 0.0206   Epoch: 10   Global Step: 135690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:33:34,408-Speed 3058.71 samples/sec   Loss 6.6862   LearningRate 0.0206   Epoch: 10   Global Step: 135700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:33:37,792-Speed 3026.41 samples/sec   Loss 6.6306   LearningRate 0.0206   Epoch: 10   Global Step: 135710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:33:41,118-Speed 3079.87 samples/sec   Loss 6.6851   LearningRate 0.0206   Epoch: 10   Global Step: 135720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:33:44,537-Speed 2995.68 samples/sec   Loss 6.6969   LearningRate 0.0206   Epoch: 10   Global Step: 135730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:33:47,889-Speed 3055.84 samples/sec   Loss 6.5961   LearningRate 0.0206   Epoch: 10   Global Step: 135740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:33:51,292-Speed 3009.47 samples/sec   Loss 6.6308   LearningRate 0.0206   Epoch: 10   Global Step: 135750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:33:54,628-Speed 3071.69 samples/sec   Loss 6.6251   LearningRate 0.0206   Epoch: 10   Global Step: 135760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:33:58,070-Speed 2975.58 samples/sec   Loss 6.5104   LearningRate 0.0206   Epoch: 10   Global Step: 135770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:34:01,408-Speed 3068.56 samples/sec   Loss 6.6992   LearningRate 0.0206   Epoch: 10   Global Step: 135780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:34:04,711-Speed 3101.00 samples/sec   Loss 6.5488   LearningRate 0.0206   Epoch: 10   Global Step: 135790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:34:08,020-Speed 3095.31 samples/sec   Loss 6.5176   LearningRate 0.0206   Epoch: 10   Global Step: 135800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:34:11,393-Speed 3036.51 samples/sec   Loss 6.6322   LearningRate 0.0205   Epoch: 10   Global Step: 135810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:34:14,933-Speed 2893.58 samples/sec   Loss 6.6978   LearningRate 0.0205   Epoch: 10   Global Step: 135820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:34:18,278-Speed 3062.24 samples/sec   Loss 6.5880   LearningRate 0.0205   Epoch: 10   Global Step: 135830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:34:21,614-Speed 3070.29 samples/sec   Loss 6.6148   LearningRate 0.0205   Epoch: 10   Global Step: 135840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:34:24,989-Speed 3035.01 samples/sec   Loss 6.6080   LearningRate 0.0205   Epoch: 10   Global Step: 135850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:34:28,418-Speed 2987.90 samples/sec   Loss 6.6588   LearningRate 0.0205   Epoch: 10   Global Step: 135860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:34:31,835-Speed 2997.44 samples/sec   Loss 6.6144   LearningRate 0.0205   Epoch: 10   Global Step: 135870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:34:35,232-Speed 3015.81 samples/sec   Loss 6.7454   LearningRate 0.0205   Epoch: 10   Global Step: 135880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:34:38,572-Speed 3067.11 samples/sec   Loss 6.6151   LearningRate 0.0205   Epoch: 10   Global Step: 135890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:34:41,945-Speed 3036.43 samples/sec   Loss 6.6313   LearningRate 0.0205   Epoch: 10   Global Step: 135900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:34:45,298-Speed 3054.94 samples/sec   Loss 6.6977   LearningRate 0.0205   Epoch: 10   Global Step: 135910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:34:48,651-Speed 3054.43 samples/sec   Loss 6.5718   LearningRate 0.0205   Epoch: 10   Global Step: 135920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:34:52,077-Speed 2990.20 samples/sec   Loss 6.7057   LearningRate 0.0205   Epoch: 10   Global Step: 135930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:34:55,488-Speed 3003.48 samples/sec   Loss 6.7228   LearningRate 0.0205   Epoch: 10   Global Step: 135940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:34:58,821-Speed 3072.99 samples/sec   Loss 6.6301   LearningRate 0.0205   Epoch: 10   Global Step: 135950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:35:02,173-Speed 3055.43 samples/sec   Loss 6.6238   LearningRate 0.0205   Epoch: 10   Global Step: 135960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:35:05,510-Speed 3070.14 samples/sec   Loss 6.6507   LearningRate 0.0205   Epoch: 10   Global Step: 135970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:35:08,954-Speed 2973.15 samples/sec   Loss 6.6445   LearningRate 0.0205   Epoch: 10   Global Step: 135980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:35:12,361-Speed 3006.99 samples/sec   Loss 6.7366   LearningRate 0.0205   Epoch: 10   Global Step: 135990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 14:35:15,800-Speed 2978.44 samples/sec   Loss 6.6240   LearningRate 0.0205   Epoch: 10   Global Step: 136000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 14:35:19,237-Speed 2979.88 samples/sec   Loss 6.5670   LearningRate 0.0205   Epoch: 10   Global Step: 136010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 14:35:22,588-Speed 3056.87 samples/sec   Loss 6.6214   LearningRate 0.0205   Epoch: 10   Global Step: 136020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:35:25,915-Speed 3078.45 samples/sec   Loss 6.6541   LearningRate 0.0205   Epoch: 10   Global Step: 136030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:35:29,255-Speed 3067.25 samples/sec   Loss 6.6002   LearningRate 0.0205   Epoch: 10   Global Step: 136040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:35:32,665-Speed 3003.68 samples/sec   Loss 6.7018   LearningRate 0.0205   Epoch: 10   Global Step: 136050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:35:36,034-Speed 3039.96 samples/sec   Loss 6.7114   LearningRate 0.0205   Epoch: 10   Global Step: 136060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:35:39,370-Speed 3071.20 samples/sec   Loss 6.5462   LearningRate 0.0205   Epoch: 10   Global Step: 136070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:35:42,673-Speed 3100.97 samples/sec   Loss 6.5666   LearningRate 0.0205   Epoch: 10   Global Step: 136080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:35:45,986-Speed 3091.43 samples/sec   Loss 6.6022   LearningRate 0.0204   Epoch: 10   Global Step: 136090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:35:49,373-Speed 3024.43 samples/sec   Loss 6.5212   LearningRate 0.0204   Epoch: 10   Global Step: 136100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:35:52,727-Speed 3053.59 samples/sec   Loss 6.6345   LearningRate 0.0204   Epoch: 10   Global Step: 136110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:35:56,122-Speed 3017.08 samples/sec   Loss 6.5538   LearningRate 0.0204   Epoch: 10   Global Step: 136120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:35:59,489-Speed 3041.71 samples/sec   Loss 6.5686   LearningRate 0.0204   Epoch: 10   Global Step: 136130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:02,809-Speed 3085.18 samples/sec   Loss 6.6447   LearningRate 0.0204   Epoch: 10   Global Step: 136140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:06,265-Speed 2964.23 samples/sec   Loss 6.6816   LearningRate 0.0204   Epoch: 10   Global Step: 136150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:09,658-Speed 3018.33 samples/sec   Loss 6.5961   LearningRate 0.0204   Epoch: 10   Global Step: 136160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:13,036-Speed 3032.38 samples/sec   Loss 6.5700   LearningRate 0.0204   Epoch: 10   Global Step: 136170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:16,395-Speed 3049.52 samples/sec   Loss 6.5425   LearningRate 0.0204   Epoch: 10   Global Step: 136180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:19,729-Speed 3072.75 samples/sec   Loss 6.5381   LearningRate 0.0204   Epoch: 10   Global Step: 136190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:23,041-Speed 3092.43 samples/sec   Loss 6.5612   LearningRate 0.0204   Epoch: 10   Global Step: 136200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:26,369-Speed 3077.10 samples/sec   Loss 6.7014   LearningRate 0.0204   Epoch: 10   Global Step: 136210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:29,762-Speed 3018.98 samples/sec   Loss 6.5095   LearningRate 0.0204   Epoch: 10   Global Step: 136220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 14:36:33,131-Speed 3041.13 samples/sec   Loss 6.5237   LearningRate 0.0204   Epoch: 10   Global Step: 136230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:36,457-Speed 3079.81 samples/sec   Loss 6.6785   LearningRate 0.0204   Epoch: 10   Global Step: 136240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:39,820-Speed 3045.68 samples/sec   Loss 6.5912   LearningRate 0.0204   Epoch: 10   Global Step: 136250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:43,236-Speed 2998.22 samples/sec   Loss 6.5565   LearningRate 0.0204   Epoch: 10   Global Step: 136260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:46,601-Speed 3044.08 samples/sec   Loss 6.5500   LearningRate 0.0204   Epoch: 10   Global Step: 136270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:50,025-Speed 2991.32 samples/sec   Loss 6.5322   LearningRate 0.0204   Epoch: 10   Global Step: 136280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:53,400-Speed 3035.44 samples/sec   Loss 6.5444   LearningRate 0.0204   Epoch: 10   Global Step: 136290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:36:56,715-Speed 3089.78 samples/sec   Loss 6.5729   LearningRate 0.0204   Epoch: 10   Global Step: 136300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:37:00,054-Speed 3067.86 samples/sec   Loss 6.5298   LearningRate 0.0204   Epoch: 10   Global Step: 136310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:37:03,506-Speed 2967.59 samples/sec   Loss 6.8303   LearningRate 0.0204   Epoch: 10   Global Step: 136320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:37:06,854-Speed 3059.65 samples/sec   Loss 6.5975   LearningRate 0.0204   Epoch: 10   Global Step: 136330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:37:10,244-Speed 3020.76 samples/sec   Loss 6.6299   LearningRate 0.0204   Epoch: 10   Global Step: 136340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:37:13,625-Speed 3030.01 samples/sec   Loss 6.5539   LearningRate 0.0204   Epoch: 10   Global Step: 136350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:37:16,956-Speed 3074.79 samples/sec   Loss 6.5770   LearningRate 0.0203   Epoch: 10   Global Step: 136360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:37:20,440-Speed 2940.47 samples/sec   Loss 6.6013   LearningRate 0.0203   Epoch: 10   Global Step: 136370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:37:23,849-Speed 3004.15 samples/sec   Loss 6.6586   LearningRate 0.0203   Epoch: 10   Global Step: 136380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:37:27,321-Speed 2949.67 samples/sec   Loss 6.5766   LearningRate 0.0203   Epoch: 10   Global Step: 136390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:37:30,766-Speed 2973.50 samples/sec   Loss 6.5236   LearningRate 0.0203   Epoch: 10   Global Step: 136400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:37:34,168-Speed 3010.71 samples/sec   Loss 6.7258   LearningRate 0.0203   Epoch: 10   Global Step: 136410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:37:37,542-Speed 3036.19 samples/sec   Loss 6.5807   LearningRate 0.0203   Epoch: 10   Global Step: 136420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:37:40,862-Speed 3085.08 samples/sec   Loss 6.6613   LearningRate 0.0203   Epoch: 10   Global Step: 136430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:37:44,270-Speed 3006.05 samples/sec   Loss 6.5296   LearningRate 0.0203   Epoch: 10   Global Step: 136440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:37:47,584-Speed 3091.04 samples/sec   Loss 6.5859   LearningRate 0.0203   Epoch: 10   Global Step: 136450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:37:50,930-Speed 3062.07 samples/sec   Loss 6.6202   LearningRate 0.0203   Epoch: 10   Global Step: 136460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:37:54,380-Speed 2969.21 samples/sec   Loss 6.5933   LearningRate 0.0203   Epoch: 10   Global Step: 136470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:37:57,749-Speed 3040.36 samples/sec   Loss 6.6265   LearningRate 0.0203   Epoch: 10   Global Step: 136480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:38:01,073-Speed 3081.25 samples/sec   Loss 6.6391   LearningRate 0.0203   Epoch: 10   Global Step: 136490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:38:04,500-Speed 2988.68 samples/sec   Loss 6.6145   LearningRate 0.0203   Epoch: 10   Global Step: 136500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:38:07,901-Speed 3012.50 samples/sec   Loss 6.5754   LearningRate 0.0203   Epoch: 10   Global Step: 136510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:38:11,253-Speed 3056.07 samples/sec   Loss 6.5791   LearningRate 0.0203   Epoch: 10   Global Step: 136520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:38:14,571-Speed 3087.04 samples/sec   Loss 6.6689   LearningRate 0.0203   Epoch: 10   Global Step: 136530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:38:17,875-Speed 3100.30 samples/sec   Loss 6.6408   LearningRate 0.0203   Epoch: 10   Global Step: 136540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:38:21,179-Speed 3099.96 samples/sec   Loss 6.5580   LearningRate 0.0203   Epoch: 10   Global Step: 136550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:38:24,651-Speed 2950.14 samples/sec   Loss 6.6209   LearningRate 0.0203   Epoch: 10   Global Step: 136560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:38:28,117-Speed 2955.21 samples/sec   Loss 6.5082   LearningRate 0.0203   Epoch: 10   Global Step: 136570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:38:31,512-Speed 3017.01 samples/sec   Loss 6.5578   LearningRate 0.0203   Epoch: 10   Global Step: 136580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:38:34,930-Speed 2996.78 samples/sec   Loss 6.5432   LearningRate 0.0203   Epoch: 10   Global Step: 136590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:38:38,237-Speed 3098.47 samples/sec   Loss 6.6389   LearningRate 0.0203   Epoch: 10   Global Step: 136600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:38:41,673-Speed 2980.25 samples/sec   Loss 6.6207   LearningRate 0.0203   Epoch: 10   Global Step: 136610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:38:45,407-Speed 2743.47 samples/sec   Loss 6.5384   LearningRate 0.0203   Epoch: 10   Global Step: 136620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:38:48,829-Speed 2992.99 samples/sec   Loss 6.6018   LearningRate 0.0203   Epoch: 10   Global Step: 136630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:39:22,431-Speed 304.75 samples/sec   Loss 5.2470   LearningRate 0.0202   Epoch: 11   Global Step: 136640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:39:26,255-Speed 2678.44 samples/sec   Loss 5.0876   LearningRate 0.0202   Epoch: 11   Global Step: 136650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:39:29,656-Speed 3011.58 samples/sec   Loss 5.0402   LearningRate 0.0202   Epoch: 11   Global Step: 136660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:39:33,056-Speed 3013.36 samples/sec   Loss 5.1029   LearningRate 0.0202   Epoch: 11   Global Step: 136670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:39:36,494-Speed 2979.08 samples/sec   Loss 5.0153   LearningRate 0.0202   Epoch: 11   Global Step: 136680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:39:39,837-Speed 3063.55 samples/sec   Loss 5.1525   LearningRate 0.0202   Epoch: 11   Global Step: 136690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:39:43,214-Speed 3033.51 samples/sec   Loss 5.1737   LearningRate 0.0202   Epoch: 11   Global Step: 136700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:39:46,734-Speed 2911.25 samples/sec   Loss 5.1295   LearningRate 0.0202   Epoch: 11   Global Step: 136710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:39:50,086-Speed 3055.72 samples/sec   Loss 5.2685   LearningRate 0.0202   Epoch: 11   Global Step: 136720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:39:53,482-Speed 3016.26 samples/sec   Loss 5.2505   LearningRate 0.0202   Epoch: 11   Global Step: 136730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:39:56,810-Speed 3077.65 samples/sec   Loss 5.1335   LearningRate 0.0202   Epoch: 11   Global Step: 136740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:40:00,174-Speed 3046.24 samples/sec   Loss 5.0794   LearningRate 0.0202   Epoch: 11   Global Step: 136750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-27 14:40:03,545-Speed 3039.25 samples/sec   Loss 5.1768   LearningRate 0.0202   Epoch: 11   Global Step: 136760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:40:06,974-Speed 2987.11 samples/sec   Loss 5.0395   LearningRate 0.0202   Epoch: 11   Global Step: 136770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:40:10,360-Speed 3024.65 samples/sec   Loss 5.1716   LearningRate 0.0202   Epoch: 11   Global Step: 136780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:40:13,748-Speed 3023.63 samples/sec   Loss 5.0142   LearningRate 0.0202   Epoch: 11   Global Step: 136790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:40:17,187-Speed 2978.76 samples/sec   Loss 5.1567   LearningRate 0.0202   Epoch: 11   Global Step: 136800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:40:20,615-Speed 2987.93 samples/sec   Loss 5.2119   LearningRate 0.0202   Epoch: 11   Global Step: 136810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:40:24,032-Speed 2997.24 samples/sec   Loss 5.1398   LearningRate 0.0202   Epoch: 11   Global Step: 136820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:40:27,419-Speed 3024.11 samples/sec   Loss 5.2147   LearningRate 0.0202   Epoch: 11   Global Step: 136830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:40:30,817-Speed 3014.22 samples/sec   Loss 5.1381   LearningRate 0.0202   Epoch: 11   Global Step: 136840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:40:34,274-Speed 2963.61 samples/sec   Loss 5.1540   LearningRate 0.0202   Epoch: 11   Global Step: 136850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:40:37,835-Speed 2875.95 samples/sec   Loss 5.1478   LearningRate 0.0202   Epoch: 11   Global Step: 136860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:40:41,309-Speed 2948.41 samples/sec   Loss 5.2415   LearningRate 0.0202   Epoch: 11   Global Step: 136870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:40:44,688-Speed 3031.69 samples/sec   Loss 5.2125   LearningRate 0.0202   Epoch: 11   Global Step: 136880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:40:48,130-Speed 2976.48 samples/sec   Loss 5.1967   LearningRate 0.0202   Epoch: 11   Global Step: 136890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:40:51,829-Speed 2769.05 samples/sec   Loss 5.2231   LearningRate 0.0202   Epoch: 11   Global Step: 136900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:40:55,211-Speed 3028.68 samples/sec   Loss 5.0780   LearningRate 0.0201   Epoch: 11   Global Step: 136910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:40:58,725-Speed 2915.00 samples/sec   Loss 5.2225   LearningRate 0.0201   Epoch: 11   Global Step: 136920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:41:02,093-Speed 3041.64 samples/sec   Loss 5.1922   LearningRate 0.0201   Epoch: 11   Global Step: 136930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:41:05,480-Speed 3024.09 samples/sec   Loss 5.3248   LearningRate 0.0201   Epoch: 11   Global Step: 136940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:41:08,851-Speed 3038.54 samples/sec   Loss 5.3511   LearningRate 0.0201   Epoch: 11   Global Step: 136950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:41:12,184-Speed 3073.03 samples/sec   Loss 5.2794   LearningRate 0.0201   Epoch: 11   Global Step: 136960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:41:15,588-Speed 3008.50 samples/sec   Loss 5.2520   LearningRate 0.0201   Epoch: 11   Global Step: 136970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:41:18,970-Speed 3028.83 samples/sec   Loss 5.2955   LearningRate 0.0201   Epoch: 11   Global Step: 136980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:41:22,369-Speed 3014.00 samples/sec   Loss 5.3146   LearningRate 0.0201   Epoch: 11   Global Step: 136990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:41:25,709-Speed 3066.88 samples/sec   Loss 5.2255   LearningRate 0.0201   Epoch: 11   Global Step: 137000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:41:29,085-Speed 3033.73 samples/sec   Loss 5.1972   LearningRate 0.0201   Epoch: 11   Global Step: 137010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:41:32,488-Speed 3010.06 samples/sec   Loss 5.1914   LearningRate 0.0201   Epoch: 11   Global Step: 137020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:41:35,880-Speed 3019.88 samples/sec   Loss 5.3503   LearningRate 0.0201   Epoch: 11   Global Step: 137030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:41:39,334-Speed 2965.75 samples/sec   Loss 5.4021   LearningRate 0.0201   Epoch: 11   Global Step: 137040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:41:42,661-Speed 3078.15 samples/sec   Loss 5.2259   LearningRate 0.0201   Epoch: 11   Global Step: 137050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 14:41:46,035-Speed 3035.80 samples/sec   Loss 5.2076   LearningRate 0.0201   Epoch: 11   Global Step: 137060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:41:49,449-Speed 3001.11 samples/sec   Loss 5.3281   LearningRate 0.0201   Epoch: 11   Global Step: 137070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:41:52,842-Speed 3018.56 samples/sec   Loss 5.2339   LearningRate 0.0201   Epoch: 11   Global Step: 137080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:41:56,214-Speed 3037.54 samples/sec   Loss 5.1699   LearningRate 0.0201   Epoch: 11   Global Step: 137090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:41:59,577-Speed 3045.85 samples/sec   Loss 5.1902   LearningRate 0.0201   Epoch: 11   Global Step: 137100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:42:02,918-Speed 3066.06 samples/sec   Loss 5.3507   LearningRate 0.0201   Epoch: 11   Global Step: 137110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 14:42:06,261-Speed 3063.38 samples/sec   Loss 5.2343   LearningRate 0.0201   Epoch: 11   Global Step: 137120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:42:09,613-Speed 3056.69 samples/sec   Loss 5.3541   LearningRate 0.0201   Epoch: 11   Global Step: 137130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:42:12,976-Speed 3045.16 samples/sec   Loss 5.2710   LearningRate 0.0201   Epoch: 11   Global Step: 137140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:42:16,364-Speed 3023.59 samples/sec   Loss 5.3121   LearningRate 0.0201   Epoch: 11   Global Step: 137150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:42:19,759-Speed 3017.96 samples/sec   Loss 5.2457   LearningRate 0.0201   Epoch: 11   Global Step: 137160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:42:23,156-Speed 3015.19 samples/sec   Loss 5.2978   LearningRate 0.0201   Epoch: 11   Global Step: 137170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:42:26,478-Speed 3083.32 samples/sec   Loss 5.2961   LearningRate 0.0201   Epoch: 11   Global Step: 137180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:42:29,876-Speed 3014.21 samples/sec   Loss 5.3139   LearningRate 0.0200   Epoch: 11   Global Step: 137190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:42:33,309-Speed 2983.59 samples/sec   Loss 5.3153   LearningRate 0.0200   Epoch: 11   Global Step: 137200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:42:36,661-Speed 3056.37 samples/sec   Loss 5.2569   LearningRate 0.0200   Epoch: 11   Global Step: 137210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:42:40,102-Speed 2976.82 samples/sec   Loss 5.3942   LearningRate 0.0200   Epoch: 11   Global Step: 137220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:42:43,478-Speed 3034.39 samples/sec   Loss 5.2945   LearningRate 0.0200   Epoch: 11   Global Step: 137230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:42:46,903-Speed 2990.60 samples/sec   Loss 5.3533   LearningRate 0.0200   Epoch: 11   Global Step: 137240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:42:50,304-Speed 3011.74 samples/sec   Loss 5.4195   LearningRate 0.0200   Epoch: 11   Global Step: 137250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:42:53,728-Speed 2991.75 samples/sec   Loss 5.2325   LearningRate 0.0200   Epoch: 11   Global Step: 137260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:42:57,124-Speed 3015.95 samples/sec   Loss 5.2998   LearningRate 0.0200   Epoch: 11   Global Step: 137270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:43:00,473-Speed 3060.73 samples/sec   Loss 5.2635   LearningRate 0.0200   Epoch: 11   Global Step: 137280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:43:03,807-Speed 3071.44 samples/sec   Loss 5.2622   LearningRate 0.0200   Epoch: 11   Global Step: 137290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:43:07,229-Speed 2993.69 samples/sec   Loss 5.3513   LearningRate 0.0200   Epoch: 11   Global Step: 137300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:43:10,667-Speed 2979.83 samples/sec   Loss 5.3153   LearningRate 0.0200   Epoch: 11   Global Step: 137310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:43:14,163-Speed 2929.59 samples/sec   Loss 5.3985   LearningRate 0.0200   Epoch: 11   Global Step: 137320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:43:17,659-Speed 2930.18 samples/sec   Loss 5.3265   LearningRate 0.0200   Epoch: 11   Global Step: 137330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:43:21,094-Speed 2981.56 samples/sec   Loss 5.4469   LearningRate 0.0200   Epoch: 11   Global Step: 137340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:43:24,502-Speed 3005.48 samples/sec   Loss 5.4590   LearningRate 0.0200   Epoch: 11   Global Step: 137350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:43:27,938-Speed 2981.61 samples/sec   Loss 5.3380   LearningRate 0.0200   Epoch: 11   Global Step: 137360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:43:31,335-Speed 3015.46 samples/sec   Loss 5.2901   LearningRate 0.0200   Epoch: 11   Global Step: 137370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:43:34,812-Speed 2945.34 samples/sec   Loss 5.3761   LearningRate 0.0200   Epoch: 11   Global Step: 137380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:43:38,170-Speed 3050.66 samples/sec   Loss 5.2764   LearningRate 0.0200   Epoch: 11   Global Step: 137390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:43:41,634-Speed 2957.26 samples/sec   Loss 5.3732   LearningRate 0.0200   Epoch: 11   Global Step: 137400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:43:45,010-Speed 3033.34 samples/sec   Loss 5.3739   LearningRate 0.0200   Epoch: 11   Global Step: 137410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:43:48,498-Speed 2937.02 samples/sec   Loss 5.3884   LearningRate 0.0200   Epoch: 11   Global Step: 137420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:43:51,926-Speed 2987.90 samples/sec   Loss 5.4802   LearningRate 0.0200   Epoch: 11   Global Step: 137430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:43:55,256-Speed 3076.14 samples/sec   Loss 5.3557   LearningRate 0.0200   Epoch: 11   Global Step: 137440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:43:58,672-Speed 2998.43 samples/sec   Loss 5.3646   LearningRate 0.0200   Epoch: 11   Global Step: 137450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:44:02,103-Speed 2985.83 samples/sec   Loss 5.3751   LearningRate 0.0200   Epoch: 11   Global Step: 137460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:44:05,453-Speed 3057.21 samples/sec   Loss 5.3661   LearningRate 0.0199   Epoch: 11   Global Step: 137470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:44:08,809-Speed 3052.27 samples/sec   Loss 5.4467   LearningRate 0.0199   Epoch: 11   Global Step: 137480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:44:12,152-Speed 3063.79 samples/sec   Loss 5.4550   LearningRate 0.0199   Epoch: 11   Global Step: 137490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:44:15,521-Speed 3040.88 samples/sec   Loss 5.3690   LearningRate 0.0199   Epoch: 11   Global Step: 137500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:44:18,877-Speed 3052.45 samples/sec   Loss 5.2983   LearningRate 0.0199   Epoch: 11   Global Step: 137510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:44:22,304-Speed 2988.82 samples/sec   Loss 5.4514   LearningRate 0.0199   Epoch: 11   Global Step: 137520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:44:25,774-Speed 2951.83 samples/sec   Loss 5.4599   LearningRate 0.0199   Epoch: 11   Global Step: 137530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:44:29,150-Speed 3033.84 samples/sec   Loss 5.5142   LearningRate 0.0199   Epoch: 11   Global Step: 137540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:44:32,492-Speed 3064.56 samples/sec   Loss 5.4876   LearningRate 0.0199   Epoch: 11   Global Step: 137550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:44:35,861-Speed 3040.05 samples/sec   Loss 5.3939   LearningRate 0.0199   Epoch: 11   Global Step: 137560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:44:39,236-Speed 3035.41 samples/sec   Loss 5.4902   LearningRate 0.0199   Epoch: 11   Global Step: 137570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:44:43,179-Speed 2598.02 samples/sec   Loss 5.3441   LearningRate 0.0199   Epoch: 11   Global Step: 137580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:44:46,514-Speed 3070.85 samples/sec   Loss 5.4569   LearningRate 0.0199   Epoch: 11   Global Step: 137590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:44:49,959-Speed 2973.44 samples/sec   Loss 5.5188   LearningRate 0.0199   Epoch: 11   Global Step: 137600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:44:54,159-Speed 2438.69 samples/sec   Loss 5.4639   LearningRate 0.0199   Epoch: 11   Global Step: 137610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:44:58,800-Speed 2206.80 samples/sec   Loss 5.3986   LearningRate 0.0199   Epoch: 11   Global Step: 137620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:45:02,228-Speed 2987.96 samples/sec   Loss 5.5036   LearningRate 0.0199   Epoch: 11   Global Step: 137630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:45:05,583-Speed 3053.18 samples/sec   Loss 5.4451   LearningRate 0.0199   Epoch: 11   Global Step: 137640   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:45:08,979-Speed 3016.10 samples/sec   Loss 5.4637   LearningRate 0.0199   Epoch: 11   Global Step: 137650   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:45:12,424-Speed 2973.96 samples/sec   Loss 5.4899   LearningRate 0.0199   Epoch: 11   Global Step: 137660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:45:15,787-Speed 3045.49 samples/sec   Loss 5.4531   LearningRate 0.0199   Epoch: 11   Global Step: 137670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:45:19,160-Speed 3036.90 samples/sec   Loss 5.4235   LearningRate 0.0199   Epoch: 11   Global Step: 137680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:45:22,581-Speed 2993.71 samples/sec   Loss 5.3859   LearningRate 0.0199   Epoch: 11   Global Step: 137690   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:45:25,970-Speed 3022.17 samples/sec   Loss 5.5060   LearningRate 0.0199   Epoch: 11   Global Step: 137700   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:45:29,301-Speed 3075.75 samples/sec   Loss 5.5584   LearningRate 0.0199   Epoch: 11   Global Step: 137710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:45:32,655-Speed 3053.53 samples/sec   Loss 5.4827   LearningRate 0.0199   Epoch: 11   Global Step: 137720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:45:36,008-Speed 3055.53 samples/sec   Loss 5.5161   LearningRate 0.0199   Epoch: 11   Global Step: 137730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:45:39,395-Speed 3023.60 samples/sec   Loss 5.4428   LearningRate 0.0199   Epoch: 11   Global Step: 137740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:45:42,809-Speed 3000.49 samples/sec   Loss 5.4874   LearningRate 0.0198   Epoch: 11   Global Step: 137750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:45:46,204-Speed 3017.44 samples/sec   Loss 5.4840   LearningRate 0.0198   Epoch: 11   Global Step: 137760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:45:49,685-Speed 2942.48 samples/sec   Loss 5.6238   LearningRate 0.0198   Epoch: 11   Global Step: 137770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:45:53,161-Speed 2946.30 samples/sec   Loss 5.5097   LearningRate 0.0198   Epoch: 11   Global Step: 137780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:45:56,600-Speed 2979.26 samples/sec   Loss 5.5873   LearningRate 0.0198   Epoch: 11   Global Step: 137790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:00,033-Speed 2983.71 samples/sec   Loss 5.5386   LearningRate 0.0198   Epoch: 11   Global Step: 137800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:03,453-Speed 2995.55 samples/sec   Loss 5.4895   LearningRate 0.0198   Epoch: 11   Global Step: 137810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:06,887-Speed 2983.05 samples/sec   Loss 5.5774   LearningRate 0.0198   Epoch: 11   Global Step: 137820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:10,327-Speed 2977.96 samples/sec   Loss 5.5517   LearningRate 0.0198   Epoch: 11   Global Step: 137830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:13,752-Speed 2990.52 samples/sec   Loss 5.6453   LearningRate 0.0198   Epoch: 11   Global Step: 137840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:46:17,223-Speed 2950.89 samples/sec   Loss 5.5233   LearningRate 0.0198   Epoch: 11   Global Step: 137850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:46:20,544-Speed 3084.03 samples/sec   Loss 5.5205   LearningRate 0.0198   Epoch: 11   Global Step: 137860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:23,928-Speed 3027.14 samples/sec   Loss 5.5657   LearningRate 0.0198   Epoch: 11   Global Step: 137870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:27,435-Speed 2920.38 samples/sec   Loss 5.5550   LearningRate 0.0198   Epoch: 11   Global Step: 137880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:30,845-Speed 3004.17 samples/sec   Loss 5.4983   LearningRate 0.0198   Epoch: 11   Global Step: 137890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:34,277-Speed 2984.37 samples/sec   Loss 5.4022   LearningRate 0.0198   Epoch: 11   Global Step: 137900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:37,704-Speed 2989.11 samples/sec   Loss 5.5772   LearningRate 0.0198   Epoch: 11   Global Step: 137910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:41,170-Speed 2955.34 samples/sec   Loss 5.5382   LearningRate 0.0198   Epoch: 11   Global Step: 137920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:44,527-Speed 3050.68 samples/sec   Loss 5.5660   LearningRate 0.0198   Epoch: 11   Global Step: 137930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:47,845-Speed 3087.38 samples/sec   Loss 5.5428   LearningRate 0.0198   Epoch: 11   Global Step: 137940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:51,276-Speed 2986.02 samples/sec   Loss 5.4947   LearningRate 0.0198   Epoch: 11   Global Step: 137950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:46:54,603-Speed 3078.49 samples/sec   Loss 5.4821   LearningRate 0.0198   Epoch: 11   Global Step: 137960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:46:58,624-Speed 2546.81 samples/sec   Loss 5.6513   LearningRate 0.0198   Epoch: 11   Global Step: 137970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:47:02,025-Speed 3012.06 samples/sec   Loss 5.5341   LearningRate 0.0198   Epoch: 11   Global Step: 137980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:47:06,024-Speed 2561.89 samples/sec   Loss 5.7075   LearningRate 0.0198   Epoch: 11   Global Step: 137990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:47:09,466-Speed 2975.58 samples/sec   Loss 5.6016   LearningRate 0.0198   Epoch: 11   Global Step: 138000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:47:12,840-Speed 3036.28 samples/sec   Loss 5.5152   LearningRate 0.0198   Epoch: 11   Global Step: 138010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:47:16,176-Speed 3070.66 samples/sec   Loss 5.4685   LearningRate 0.0197   Epoch: 11   Global Step: 138020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:47:19,565-Speed 3021.83 samples/sec   Loss 5.5833   LearningRate 0.0197   Epoch: 11   Global Step: 138030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:47:23,007-Speed 2976.46 samples/sec   Loss 5.5273   LearningRate 0.0197   Epoch: 11   Global Step: 138040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:47:26,386-Speed 3030.58 samples/sec   Loss 5.6117   LearningRate 0.0197   Epoch: 11   Global Step: 138050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:47:29,784-Speed 3014.96 samples/sec   Loss 5.6186   LearningRate 0.0197   Epoch: 11   Global Step: 138060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 14:47:33,080-Speed 3107.78 samples/sec   Loss 5.6363   LearningRate 0.0197   Epoch: 11   Global Step: 138070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:47:36,440-Speed 3049.15 samples/sec   Loss 5.5469   LearningRate 0.0197   Epoch: 11   Global Step: 138080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:47:39,772-Speed 3074.34 samples/sec   Loss 5.5419   LearningRate 0.0197   Epoch: 11   Global Step: 138090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:47:43,178-Speed 3007.22 samples/sec   Loss 5.6212   LearningRate 0.0197   Epoch: 11   Global Step: 138100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:47:46,517-Speed 3066.89 samples/sec   Loss 5.5637   LearningRate 0.0197   Epoch: 11   Global Step: 138110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:47:49,875-Speed 3050.49 samples/sec   Loss 5.6273   LearningRate 0.0197   Epoch: 11   Global Step: 138120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:47:53,384-Speed 2919.31 samples/sec   Loss 5.5590   LearningRate 0.0197   Epoch: 11   Global Step: 138130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:47:56,715-Speed 3074.71 samples/sec   Loss 5.5391   LearningRate 0.0197   Epoch: 11   Global Step: 138140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:48:00,020-Speed 3100.00 samples/sec   Loss 5.6830   LearningRate 0.0197   Epoch: 11   Global Step: 138150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:48:03,462-Speed 2975.88 samples/sec   Loss 5.5568   LearningRate 0.0197   Epoch: 11   Global Step: 138160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:48:06,974-Speed 2918.08 samples/sec   Loss 5.5790   LearningRate 0.0197   Epoch: 11   Global Step: 138170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:48:10,348-Speed 3035.72 samples/sec   Loss 5.6428   LearningRate 0.0197   Epoch: 11   Global Step: 138180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:48:13,705-Speed 3050.91 samples/sec   Loss 5.6638   LearningRate 0.0197   Epoch: 11   Global Step: 138190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:48:17,132-Speed 2989.67 samples/sec   Loss 5.6102   LearningRate 0.0197   Epoch: 11   Global Step: 138200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:48:20,522-Speed 3021.54 samples/sec   Loss 5.6018   LearningRate 0.0197   Epoch: 11   Global Step: 138210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:48:23,896-Speed 3035.56 samples/sec   Loss 5.7315   LearningRate 0.0197   Epoch: 11   Global Step: 138220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:48:27,273-Speed 3033.60 samples/sec   Loss 5.7146   LearningRate 0.0197   Epoch: 11   Global Step: 138230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:48:30,658-Speed 3026.15 samples/sec   Loss 5.5411   LearningRate 0.0197   Epoch: 11   Global Step: 138240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:48:34,063-Speed 3007.59 samples/sec   Loss 5.6156   LearningRate 0.0197   Epoch: 11   Global Step: 138250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:48:37,446-Speed 3028.12 samples/sec   Loss 5.5322   LearningRate 0.0197   Epoch: 11   Global Step: 138260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:48:40,774-Speed 3077.93 samples/sec   Loss 5.6076   LearningRate 0.0197   Epoch: 11   Global Step: 138270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:48:44,152-Speed 3032.07 samples/sec   Loss 5.6223   LearningRate 0.0197   Epoch: 11   Global Step: 138280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:48:47,518-Speed 3043.03 samples/sec   Loss 5.6211   LearningRate 0.0197   Epoch: 11   Global Step: 138290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:48:50,917-Speed 3013.78 samples/sec   Loss 5.6517   LearningRate 0.0196   Epoch: 11   Global Step: 138300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:48:54,236-Speed 3085.55 samples/sec   Loss 5.6574   LearningRate 0.0196   Epoch: 11   Global Step: 138310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:48:57,639-Speed 3010.78 samples/sec   Loss 5.5269   LearningRate 0.0196   Epoch: 11   Global Step: 138320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:49:01,048-Speed 3004.44 samples/sec   Loss 5.6633   LearningRate 0.0196   Epoch: 11   Global Step: 138330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:49:04,473-Speed 2990.55 samples/sec   Loss 5.5995   LearningRate 0.0196   Epoch: 11   Global Step: 138340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:49:07,899-Speed 2990.50 samples/sec   Loss 5.6907   LearningRate 0.0196   Epoch: 11   Global Step: 138350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:49:11,279-Speed 3030.12 samples/sec   Loss 5.6272   LearningRate 0.0196   Epoch: 11   Global Step: 138360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:49:14,756-Speed 2945.89 samples/sec   Loss 5.7025   LearningRate 0.0196   Epoch: 11   Global Step: 138370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:49:18,159-Speed 3010.30 samples/sec   Loss 5.7014   LearningRate 0.0196   Epoch: 11   Global Step: 138380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:49:21,532-Speed 3037.01 samples/sec   Loss 5.7378   LearningRate 0.0196   Epoch: 11   Global Step: 138390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:49:24,932-Speed 3012.84 samples/sec   Loss 5.7091   LearningRate 0.0196   Epoch: 11   Global Step: 138400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:49:28,295-Speed 3045.02 samples/sec   Loss 5.6279   LearningRate 0.0196   Epoch: 11   Global Step: 138410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:49:31,674-Speed 3032.14 samples/sec   Loss 5.6825   LearningRate 0.0196   Epoch: 11   Global Step: 138420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:49:35,089-Speed 2998.95 samples/sec   Loss 5.6655   LearningRate 0.0196   Epoch: 11   Global Step: 138430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:49:38,457-Speed 3042.16 samples/sec   Loss 5.6469   LearningRate 0.0196   Epoch: 11   Global Step: 138440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:49:41,797-Speed 3066.53 samples/sec   Loss 5.7239   LearningRate 0.0196   Epoch: 11   Global Step: 138450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:49:45,189-Speed 3019.16 samples/sec   Loss 5.6730   LearningRate 0.0196   Epoch: 11   Global Step: 138460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:49:48,594-Speed 3008.41 samples/sec   Loss 5.7010   LearningRate 0.0196   Epoch: 11   Global Step: 138470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:49:51,967-Speed 3037.23 samples/sec   Loss 5.5896   LearningRate 0.0196   Epoch: 11   Global Step: 138480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:49:55,380-Speed 3001.34 samples/sec   Loss 5.7438   LearningRate 0.0196   Epoch: 11   Global Step: 138490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:49:58,761-Speed 3029.11 samples/sec   Loss 5.5900   LearningRate 0.0196   Epoch: 11   Global Step: 138500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:50:02,144-Speed 3028.15 samples/sec   Loss 5.8236   LearningRate 0.0196   Epoch: 11   Global Step: 138510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:50:05,523-Speed 3031.03 samples/sec   Loss 5.5781   LearningRate 0.0196   Epoch: 11   Global Step: 138520   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:50:09,032-Speed 2919.68 samples/sec   Loss 5.7521   LearningRate 0.0196   Epoch: 11   Global Step: 138530   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:50:12,468-Speed 2981.07 samples/sec   Loss 5.7029   LearningRate 0.0196   Epoch: 11   Global Step: 138540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:50:15,806-Speed 3068.41 samples/sec   Loss 5.7101   LearningRate 0.0196   Epoch: 11   Global Step: 138550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:50:19,220-Speed 3000.02 samples/sec   Loss 5.6632   LearningRate 0.0196   Epoch: 11   Global Step: 138560   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:50:22,651-Speed 2985.72 samples/sec   Loss 5.6637   LearningRate 0.0196   Epoch: 11   Global Step: 138570   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:50:26,053-Speed 3010.71 samples/sec   Loss 5.5923   LearningRate 0.0196   Epoch: 11   Global Step: 138580   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:50:29,423-Speed 3039.90 samples/sec   Loss 5.7790   LearningRate 0.0195   Epoch: 11   Global Step: 138590   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:50:32,735-Speed 3092.57 samples/sec   Loss 5.7628   LearningRate 0.0195   Epoch: 11   Global Step: 138600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:50:36,091-Speed 3051.66 samples/sec   Loss 5.7305   LearningRate 0.0195   Epoch: 11   Global Step: 138610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:50:39,478-Speed 3024.88 samples/sec   Loss 5.7328   LearningRate 0.0195   Epoch: 11   Global Step: 138620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:50:42,830-Speed 3055.56 samples/sec   Loss 5.7315   LearningRate 0.0195   Epoch: 11   Global Step: 138630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:50:46,280-Speed 2968.49 samples/sec   Loss 5.8123   LearningRate 0.0195   Epoch: 11   Global Step: 138640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:50:49,677-Speed 3017.25 samples/sec   Loss 5.6773   LearningRate 0.0195   Epoch: 11   Global Step: 138650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:50:53,044-Speed 3042.12 samples/sec   Loss 5.7607   LearningRate 0.0195   Epoch: 11   Global Step: 138660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:50:56,418-Speed 3035.35 samples/sec   Loss 5.8770   LearningRate 0.0195   Epoch: 11   Global Step: 138670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:50:59,889-Speed 2951.45 samples/sec   Loss 5.7453   LearningRate 0.0195   Epoch: 11   Global Step: 138680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:51:03,371-Speed 2941.67 samples/sec   Loss 5.6513   LearningRate 0.0195   Epoch: 11   Global Step: 138690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:51:06,808-Speed 2980.25 samples/sec   Loss 5.7829   LearningRate 0.0195   Epoch: 11   Global Step: 138700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:51:10,140-Speed 3074.39 samples/sec   Loss 5.7560   LearningRate 0.0195   Epoch: 11   Global Step: 138710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:51:13,616-Speed 2946.81 samples/sec   Loss 5.7700   LearningRate 0.0195   Epoch: 11   Global Step: 138720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:51:16,964-Speed 3059.43 samples/sec   Loss 5.7331   LearningRate 0.0195   Epoch: 11   Global Step: 138730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:51:20,357-Speed 3019.54 samples/sec   Loss 5.7523   LearningRate 0.0195   Epoch: 11   Global Step: 138740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:51:23,737-Speed 3030.44 samples/sec   Loss 5.7052   LearningRate 0.0195   Epoch: 11   Global Step: 138750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:51:27,049-Speed 3092.39 samples/sec   Loss 5.7168   LearningRate 0.0195   Epoch: 11   Global Step: 138760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:51:30,367-Speed 3087.03 samples/sec   Loss 5.6590   LearningRate 0.0195   Epoch: 11   Global Step: 138770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:51:33,717-Speed 3058.43 samples/sec   Loss 5.8683   LearningRate 0.0195   Epoch: 11   Global Step: 138780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:51:37,118-Speed 3011.14 samples/sec   Loss 5.8081   LearningRate 0.0195   Epoch: 11   Global Step: 138790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:51:40,604-Speed 2938.49 samples/sec   Loss 5.7212   LearningRate 0.0195   Epoch: 11   Global Step: 138800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:51:43,990-Speed 3025.10 samples/sec   Loss 5.6742   LearningRate 0.0195   Epoch: 11   Global Step: 138810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:51:47,406-Speed 2998.79 samples/sec   Loss 5.6505   LearningRate 0.0195   Epoch: 11   Global Step: 138820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:51:50,759-Speed 3055.04 samples/sec   Loss 5.7423   LearningRate 0.0195   Epoch: 11   Global Step: 138830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:51:54,084-Speed 3080.64 samples/sec   Loss 5.7539   LearningRate 0.0195   Epoch: 11   Global Step: 138840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:51:57,508-Speed 2991.36 samples/sec   Loss 5.7930   LearningRate 0.0195   Epoch: 11   Global Step: 138850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:52:00,821-Speed 3091.92 samples/sec   Loss 5.6626   LearningRate 0.0195   Epoch: 11   Global Step: 138860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:52:04,210-Speed 3022.33 samples/sec   Loss 5.7423   LearningRate 0.0194   Epoch: 11   Global Step: 138870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:52:07,557-Speed 3059.63 samples/sec   Loss 5.7292   LearningRate 0.0194   Epoch: 11   Global Step: 138880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:52:11,024-Speed 2955.23 samples/sec   Loss 5.6760   LearningRate 0.0194   Epoch: 11   Global Step: 138890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:52:14,403-Speed 3031.28 samples/sec   Loss 5.8594   LearningRate 0.0194   Epoch: 11   Global Step: 138900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:52:17,741-Speed 3068.18 samples/sec   Loss 5.8939   LearningRate 0.0194   Epoch: 11   Global Step: 138910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:52:21,151-Speed 3004.83 samples/sec   Loss 5.7381   LearningRate 0.0194   Epoch: 11   Global Step: 138920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:52:24,554-Speed 3010.04 samples/sec   Loss 5.6916   LearningRate 0.0194   Epoch: 11   Global Step: 138930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:52:27,973-Speed 2995.79 samples/sec   Loss 5.7907   LearningRate 0.0194   Epoch: 11   Global Step: 138940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:52:31,328-Speed 3053.27 samples/sec   Loss 5.7027   LearningRate 0.0194   Epoch: 11   Global Step: 138950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:52:34,737-Speed 3004.36 samples/sec   Loss 5.8906   LearningRate 0.0194   Epoch: 11   Global Step: 138960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:52:38,114-Speed 3032.85 samples/sec   Loss 5.7975   LearningRate 0.0194   Epoch: 11   Global Step: 138970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:52:41,582-Speed 2954.70 samples/sec   Loss 5.8054   LearningRate 0.0194   Epoch: 11   Global Step: 138980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:52:45,051-Speed 2952.30 samples/sec   Loss 5.8812   LearningRate 0.0194   Epoch: 11   Global Step: 138990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:52:48,452-Speed 3011.72 samples/sec   Loss 5.8561   LearningRate 0.0194   Epoch: 11   Global Step: 139000   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:52:51,841-Speed 3022.81 samples/sec   Loss 5.8409   LearningRate 0.0194   Epoch: 11   Global Step: 139010   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:52:55,237-Speed 3016.82 samples/sec   Loss 5.8256   LearningRate 0.0194   Epoch: 11   Global Step: 139020   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:52:58,690-Speed 2967.56 samples/sec   Loss 5.7133   LearningRate 0.0194   Epoch: 11   Global Step: 139030   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 14:53:02,091-Speed 3012.37 samples/sec   Loss 5.9397   LearningRate 0.0194   Epoch: 11   Global Step: 139040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:53:05,458-Speed 3041.81 samples/sec   Loss 5.7744   LearningRate 0.0194   Epoch: 11   Global Step: 139050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:53:08,819-Speed 3047.51 samples/sec   Loss 5.8335   LearningRate 0.0194   Epoch: 11   Global Step: 139060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:53:12,189-Speed 3039.49 samples/sec   Loss 5.7979   LearningRate 0.0194   Epoch: 11   Global Step: 139070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:53:15,533-Speed 3062.74 samples/sec   Loss 5.7731   LearningRate 0.0194   Epoch: 11   Global Step: 139080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:53:18,948-Speed 3000.03 samples/sec   Loss 5.8265   LearningRate 0.0194   Epoch: 11   Global Step: 139090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:53:22,333-Speed 3025.82 samples/sec   Loss 5.8536   LearningRate 0.0194   Epoch: 11   Global Step: 139100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:53:25,703-Speed 3039.21 samples/sec   Loss 5.7487   LearningRate 0.0194   Epoch: 11   Global Step: 139110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:53:29,109-Speed 3007.44 samples/sec   Loss 5.8134   LearningRate 0.0194   Epoch: 11   Global Step: 139120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:53:32,555-Speed 2972.11 samples/sec   Loss 5.8991   LearningRate 0.0194   Epoch: 11   Global Step: 139130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:53:35,929-Speed 3036.10 samples/sec   Loss 5.7481   LearningRate 0.0194   Epoch: 11   Global Step: 139140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:53:39,339-Speed 3004.27 samples/sec   Loss 5.8002   LearningRate 0.0193   Epoch: 11   Global Step: 139150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:53:42,788-Speed 2969.30 samples/sec   Loss 5.7329   LearningRate 0.0193   Epoch: 11   Global Step: 139160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:53:46,175-Speed 3023.96 samples/sec   Loss 5.8569   LearningRate 0.0193   Epoch: 11   Global Step: 139170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:53:49,578-Speed 3009.96 samples/sec   Loss 5.8176   LearningRate 0.0193   Epoch: 11   Global Step: 139180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:53:52,976-Speed 3015.20 samples/sec   Loss 5.8354   LearningRate 0.0193   Epoch: 11   Global Step: 139190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:53:56,463-Speed 2937.50 samples/sec   Loss 5.8621   LearningRate 0.0193   Epoch: 11   Global Step: 139200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:53:59,863-Speed 3012.85 samples/sec   Loss 5.9518   LearningRate 0.0193   Epoch: 11   Global Step: 139210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:54:03,327-Speed 2957.23 samples/sec   Loss 5.8582   LearningRate 0.0193   Epoch: 11   Global Step: 139220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:54:06,649-Speed 3084.29 samples/sec   Loss 5.8362   LearningRate 0.0193   Epoch: 11   Global Step: 139230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:54:09,980-Speed 3074.35 samples/sec   Loss 5.8990   LearningRate 0.0193   Epoch: 11   Global Step: 139240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:54:13,339-Speed 3051.24 samples/sec   Loss 5.8599   LearningRate 0.0193   Epoch: 11   Global Step: 139250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:54:16,746-Speed 3005.95 samples/sec   Loss 5.8871   LearningRate 0.0193   Epoch: 11   Global Step: 139260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:54:20,173-Speed 2988.69 samples/sec   Loss 5.8866   LearningRate 0.0193   Epoch: 11   Global Step: 139270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:54:23,591-Speed 2997.59 samples/sec   Loss 5.8663   LearningRate 0.0193   Epoch: 11   Global Step: 139280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:54:26,999-Speed 3005.23 samples/sec   Loss 5.8925   LearningRate 0.0193   Epoch: 11   Global Step: 139290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:54:30,392-Speed 3018.77 samples/sec   Loss 5.8887   LearningRate 0.0193   Epoch: 11   Global Step: 139300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:54:33,792-Speed 3012.89 samples/sec   Loss 5.9625   LearningRate 0.0193   Epoch: 11   Global Step: 139310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:54:37,151-Speed 3049.38 samples/sec   Loss 5.8520   LearningRate 0.0193   Epoch: 11   Global Step: 139320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:54:40,510-Speed 3049.77 samples/sec   Loss 5.7953   LearningRate 0.0193   Epoch: 11   Global Step: 139330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:54:43,914-Speed 3009.23 samples/sec   Loss 5.8138   LearningRate 0.0193   Epoch: 11   Global Step: 139340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:54:47,351-Speed 2980.29 samples/sec   Loss 5.9564   LearningRate 0.0193   Epoch: 11   Global Step: 139350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:54:50,735-Speed 3025.87 samples/sec   Loss 5.9914   LearningRate 0.0193   Epoch: 11   Global Step: 139360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:54:54,129-Speed 3018.59 samples/sec   Loss 5.9067   LearningRate 0.0193   Epoch: 11   Global Step: 139370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:54:57,524-Speed 3017.37 samples/sec   Loss 5.8703   LearningRate 0.0193   Epoch: 11   Global Step: 139380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:00,868-Speed 3062.74 samples/sec   Loss 5.8370   LearningRate 0.0193   Epoch: 11   Global Step: 139390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:04,220-Speed 3055.37 samples/sec   Loss 5.8760   LearningRate 0.0193   Epoch: 11   Global Step: 139400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:07,685-Speed 2956.29 samples/sec   Loss 5.8519   LearningRate 0.0193   Epoch: 11   Global Step: 139410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:11,092-Speed 3006.60 samples/sec   Loss 5.8751   LearningRate 0.0193   Epoch: 11   Global Step: 139420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:14,416-Speed 3082.06 samples/sec   Loss 5.8622   LearningRate 0.0192   Epoch: 11   Global Step: 139430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:17,887-Speed 2950.92 samples/sec   Loss 5.9286   LearningRate 0.0192   Epoch: 11   Global Step: 139440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:21,289-Speed 3010.61 samples/sec   Loss 5.7764   LearningRate 0.0192   Epoch: 11   Global Step: 139450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:24,654-Speed 3043.88 samples/sec   Loss 5.9829   LearningRate 0.0192   Epoch: 11   Global Step: 139460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:28,037-Speed 3027.31 samples/sec   Loss 5.8843   LearningRate 0.0192   Epoch: 11   Global Step: 139470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:31,424-Speed 3024.10 samples/sec   Loss 6.0273   LearningRate 0.0192   Epoch: 11   Global Step: 139480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:34,816-Speed 3019.94 samples/sec   Loss 5.9031   LearningRate 0.0192   Epoch: 11   Global Step: 139490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:38,166-Speed 3057.99 samples/sec   Loss 5.8658   LearningRate 0.0192   Epoch: 11   Global Step: 139500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:41,532-Speed 3043.16 samples/sec   Loss 5.8872   LearningRate 0.0192   Epoch: 11   Global Step: 139510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:44,879-Speed 3060.62 samples/sec   Loss 5.9555   LearningRate 0.0192   Epoch: 11   Global Step: 139520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:48,294-Speed 2999.15 samples/sec   Loss 5.9832   LearningRate 0.0192   Epoch: 11   Global Step: 139530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 14:55:51,670-Speed 3033.94 samples/sec   Loss 5.8798   LearningRate 0.0192   Epoch: 11   Global Step: 139540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:55:55,117-Speed 2972.37 samples/sec   Loss 5.9329   LearningRate 0.0192   Epoch: 11   Global Step: 139550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:55:58,588-Speed 2950.77 samples/sec   Loss 5.9789   LearningRate 0.0192   Epoch: 11   Global Step: 139560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:56:01,986-Speed 3014.32 samples/sec   Loss 5.8612   LearningRate 0.0192   Epoch: 11   Global Step: 139570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:56:05,341-Speed 3053.50 samples/sec   Loss 5.8830   LearningRate 0.0192   Epoch: 11   Global Step: 139580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:56:08,798-Speed 2963.02 samples/sec   Loss 5.9640   LearningRate 0.0192   Epoch: 11   Global Step: 139590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:56:12,170-Speed 3040.33 samples/sec   Loss 5.8746   LearningRate 0.0192   Epoch: 11   Global Step: 139600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:56:15,530-Speed 3048.11 samples/sec   Loss 5.8909   LearningRate 0.0192   Epoch: 11   Global Step: 139610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:56:18,920-Speed 3021.81 samples/sec   Loss 6.0621   LearningRate 0.0192   Epoch: 11   Global Step: 139620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:56:22,277-Speed 3051.55 samples/sec   Loss 5.8834   LearningRate 0.0192   Epoch: 11   Global Step: 139630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:56:25,655-Speed 3031.59 samples/sec   Loss 5.9950   LearningRate 0.0192   Epoch: 11   Global Step: 139640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:56:29,145-Speed 2935.16 samples/sec   Loss 6.0066   LearningRate 0.0192   Epoch: 11   Global Step: 139650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:56:32,507-Speed 3046.70 samples/sec   Loss 5.8420   LearningRate 0.0192   Epoch: 11   Global Step: 139660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:56:35,855-Speed 3059.28 samples/sec   Loss 5.9243   LearningRate 0.0192   Epoch: 11   Global Step: 139670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:56:39,204-Speed 3058.80 samples/sec   Loss 6.0498   LearningRate 0.0192   Epoch: 11   Global Step: 139680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:56:42,667-Speed 2958.00 samples/sec   Loss 5.8945   LearningRate 0.0192   Epoch: 11   Global Step: 139690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:56:46,028-Speed 3047.61 samples/sec   Loss 5.9249   LearningRate 0.0192   Epoch: 11   Global Step: 139700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:56:49,386-Speed 3050.26 samples/sec   Loss 5.9389   LearningRate 0.0191   Epoch: 11   Global Step: 139710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:56:52,700-Speed 3090.86 samples/sec   Loss 5.9791   LearningRate 0.0191   Epoch: 11   Global Step: 139720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:56:56,099-Speed 3013.06 samples/sec   Loss 5.9489   LearningRate 0.0191   Epoch: 11   Global Step: 139730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:56:59,522-Speed 2992.20 samples/sec   Loss 6.0438   LearningRate 0.0191   Epoch: 11   Global Step: 139740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:02,851-Speed 3077.34 samples/sec   Loss 5.9562   LearningRate 0.0191   Epoch: 11   Global Step: 139750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:06,298-Speed 2971.32 samples/sec   Loss 5.9851   LearningRate 0.0191   Epoch: 11   Global Step: 139760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:09,642-Speed 3062.83 samples/sec   Loss 6.0628   LearningRate 0.0191   Epoch: 11   Global Step: 139770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:13,024-Speed 3029.97 samples/sec   Loss 5.8748   LearningRate 0.0191   Epoch: 11   Global Step: 139780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:16,350-Speed 3079.21 samples/sec   Loss 5.9036   LearningRate 0.0191   Epoch: 11   Global Step: 139790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:19,742-Speed 3019.19 samples/sec   Loss 5.9950   LearningRate 0.0191   Epoch: 11   Global Step: 139800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:23,090-Speed 3059.61 samples/sec   Loss 5.9238   LearningRate 0.0191   Epoch: 11   Global Step: 139810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:26,398-Speed 3096.90 samples/sec   Loss 6.0192   LearningRate 0.0191   Epoch: 11   Global Step: 139820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:57:29,777-Speed 3031.54 samples/sec   Loss 5.9465   LearningRate 0.0191   Epoch: 11   Global Step: 139830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:57:33,146-Speed 3040.13 samples/sec   Loss 5.8789   LearningRate 0.0191   Epoch: 11   Global Step: 139840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:36,498-Speed 3056.01 samples/sec   Loss 5.8806   LearningRate 0.0191   Epoch: 11   Global Step: 139850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:39,939-Speed 2976.33 samples/sec   Loss 5.9684   LearningRate 0.0191   Epoch: 11   Global Step: 139860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:43,381-Speed 2976.67 samples/sec   Loss 5.8972   LearningRate 0.0191   Epoch: 11   Global Step: 139870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:46,699-Speed 3087.33 samples/sec   Loss 5.9157   LearningRate 0.0191   Epoch: 11   Global Step: 139880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:50,091-Speed 3019.53 samples/sec   Loss 5.9936   LearningRate 0.0191   Epoch: 11   Global Step: 139890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:53,469-Speed 3031.75 samples/sec   Loss 5.9588   LearningRate 0.0191   Epoch: 11   Global Step: 139900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:57:56,958-Speed 2936.27 samples/sec   Loss 5.9053   LearningRate 0.0191   Epoch: 11   Global Step: 139910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:58:00,377-Speed 2995.61 samples/sec   Loss 5.9290   LearningRate 0.0191   Epoch: 11   Global Step: 139920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:58:03,711-Speed 3072.98 samples/sec   Loss 6.1031   LearningRate 0.0191   Epoch: 11   Global Step: 139930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:58:07,174-Speed 2957.62 samples/sec   Loss 6.0369   LearningRate 0.0191   Epoch: 11   Global Step: 139940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:58:10,489-Speed 3090.28 samples/sec   Loss 5.9934   LearningRate 0.0191   Epoch: 11   Global Step: 139950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:58:13,852-Speed 3045.73 samples/sec   Loss 5.8644   LearningRate 0.0191   Epoch: 11   Global Step: 139960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:58:17,204-Speed 3054.97 samples/sec   Loss 5.9352   LearningRate 0.0191   Epoch: 11   Global Step: 139970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:58:20,532-Speed 3078.50 samples/sec   Loss 5.8942   LearningRate 0.0191   Epoch: 11   Global Step: 139980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:58:23,877-Speed 3061.86 samples/sec   Loss 5.9514   LearningRate 0.0191   Epoch: 11   Global Step: 139990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:58:27,281-Speed 3009.07 samples/sec   Loss 6.0130   LearningRate 0.0190   Epoch: 11   Global Step: 140000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:58:30,772-Speed 2934.27 samples/sec   Loss 5.9340   LearningRate 0.0190   Epoch: 11   Global Step: 140010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:58:34,139-Speed 3041.64 samples/sec   Loss 6.0000   LearningRate 0.0190   Epoch: 11   Global Step: 140020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:58:37,542-Speed 3009.90 samples/sec   Loss 5.9757   LearningRate 0.0190   Epoch: 11   Global Step: 140030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:58:41,003-Speed 2959.40 samples/sec   Loss 6.0116   LearningRate 0.0190   Epoch: 11   Global Step: 140040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 14:58:44,367-Speed 3045.61 samples/sec   Loss 6.0382   LearningRate 0.0190   Epoch: 11   Global Step: 140050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:58:47,796-Speed 2986.50 samples/sec   Loss 6.0646   LearningRate 0.0190   Epoch: 11   Global Step: 140060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:58:51,182-Speed 3025.27 samples/sec   Loss 5.9654   LearningRate 0.0190   Epoch: 11   Global Step: 140070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:58:54,676-Speed 2931.75 samples/sec   Loss 5.9663   LearningRate 0.0190   Epoch: 11   Global Step: 140080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:58:58,026-Speed 3057.29 samples/sec   Loss 5.9802   LearningRate 0.0190   Epoch: 11   Global Step: 140090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:59:01,483-Speed 2963.17 samples/sec   Loss 6.0209   LearningRate 0.0190   Epoch: 11   Global Step: 140100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:59:04,838-Speed 3052.53 samples/sec   Loss 5.9351   LearningRate 0.0190   Epoch: 11   Global Step: 140110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:59:08,181-Speed 3064.24 samples/sec   Loss 5.9864   LearningRate 0.0190   Epoch: 11   Global Step: 140120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:59:11,550-Speed 3040.92 samples/sec   Loss 6.0388   LearningRate 0.0190   Epoch: 11   Global Step: 140130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:59:14,952-Speed 3011.07 samples/sec   Loss 5.9593   LearningRate 0.0190   Epoch: 11   Global Step: 140140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:59:18,391-Speed 2978.30 samples/sec   Loss 5.9627   LearningRate 0.0190   Epoch: 11   Global Step: 140150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:59:21,742-Speed 3057.12 samples/sec   Loss 6.1339   LearningRate 0.0190   Epoch: 11   Global Step: 140160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:59:25,136-Speed 3018.22 samples/sec   Loss 6.0658   LearningRate 0.0190   Epoch: 11   Global Step: 140170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:59:28,461-Speed 3079.97 samples/sec   Loss 5.9289   LearningRate 0.0190   Epoch: 11   Global Step: 140180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:59:31,797-Speed 3070.99 samples/sec   Loss 6.0725   LearningRate 0.0190   Epoch: 11   Global Step: 140190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:59:35,114-Speed 3088.12 samples/sec   Loss 5.9151   LearningRate 0.0190   Epoch: 11   Global Step: 140200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:59:38,515-Speed 3012.61 samples/sec   Loss 5.9890   LearningRate 0.0190   Epoch: 11   Global Step: 140210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:59:41,843-Speed 3077.70 samples/sec   Loss 5.9571   LearningRate 0.0190   Epoch: 11   Global Step: 140220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:59:45,201-Speed 3050.90 samples/sec   Loss 6.0901   LearningRate 0.0190   Epoch: 11   Global Step: 140230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:59:48,551-Speed 3057.03 samples/sec   Loss 5.9945   LearningRate 0.0190   Epoch: 11   Global Step: 140240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 14:59:51,971-Speed 2994.88 samples/sec   Loss 6.0932   LearningRate 0.0190   Epoch: 11   Global Step: 140250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:59:55,359-Speed 3023.67 samples/sec   Loss 5.9824   LearningRate 0.0190   Epoch: 11   Global Step: 140260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 14:59:58,774-Speed 2999.63 samples/sec   Loss 5.8586   LearningRate 0.0190   Epoch: 11   Global Step: 140270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:02,238-Speed 2956.39 samples/sec   Loss 5.8955   LearningRate 0.0189   Epoch: 11   Global Step: 140280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:05,560-Speed 3083.75 samples/sec   Loss 5.9999   LearningRate 0.0189   Epoch: 11   Global Step: 140290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:08,928-Speed 3041.57 samples/sec   Loss 6.0515   LearningRate 0.0189   Epoch: 11   Global Step: 140300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:12,341-Speed 3001.37 samples/sec   Loss 6.0652   LearningRate 0.0189   Epoch: 11   Global Step: 140310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:15,704-Speed 3045.54 samples/sec   Loss 5.9725   LearningRate 0.0189   Epoch: 11   Global Step: 140320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:19,054-Speed 3057.48 samples/sec   Loss 5.9924   LearningRate 0.0189   Epoch: 11   Global Step: 140330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:22,394-Speed 3067.51 samples/sec   Loss 5.9059   LearningRate 0.0189   Epoch: 11   Global Step: 140340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:25,702-Speed 3096.03 samples/sec   Loss 5.9657   LearningRate 0.0189   Epoch: 11   Global Step: 140350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:00:29,034-Speed 3074.69 samples/sec   Loss 6.0147   LearningRate 0.0189   Epoch: 11   Global Step: 140360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:32,442-Speed 3005.49 samples/sec   Loss 5.9973   LearningRate 0.0189   Epoch: 11   Global Step: 140370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:35,807-Speed 3044.07 samples/sec   Loss 5.9821   LearningRate 0.0189   Epoch: 11   Global Step: 140380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:39,250-Speed 2975.50 samples/sec   Loss 5.9842   LearningRate 0.0189   Epoch: 11   Global Step: 140390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:42,646-Speed 3015.68 samples/sec   Loss 6.0331   LearningRate 0.0189   Epoch: 11   Global Step: 140400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:46,032-Speed 3025.26 samples/sec   Loss 5.9948   LearningRate 0.0189   Epoch: 11   Global Step: 140410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:49,412-Speed 3030.36 samples/sec   Loss 6.0152   LearningRate 0.0189   Epoch: 11   Global Step: 140420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:52,890-Speed 2944.95 samples/sec   Loss 5.9712   LearningRate 0.0189   Epoch: 11   Global Step: 140430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:56,303-Speed 3001.82 samples/sec   Loss 6.0895   LearningRate 0.0189   Epoch: 11   Global Step: 140440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:00:59,706-Speed 3009.91 samples/sec   Loss 6.0284   LearningRate 0.0189   Epoch: 11   Global Step: 140450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:01:03,134-Speed 2987.69 samples/sec   Loss 6.0877   LearningRate 0.0189   Epoch: 11   Global Step: 140460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:06,536-Speed 3010.99 samples/sec   Loss 5.9498   LearningRate 0.0189   Epoch: 11   Global Step: 140470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:09,901-Speed 3044.22 samples/sec   Loss 6.1691   LearningRate 0.0189   Epoch: 11   Global Step: 140480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:13,320-Speed 2996.31 samples/sec   Loss 5.9454   LearningRate 0.0189   Epoch: 11   Global Step: 140490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:16,743-Speed 2991.46 samples/sec   Loss 6.0149   LearningRate 0.0189   Epoch: 11   Global Step: 140500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:20,139-Speed 3016.49 samples/sec   Loss 6.0060   LearningRate 0.0189   Epoch: 11   Global Step: 140510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:23,574-Speed 2981.95 samples/sec   Loss 6.0662   LearningRate 0.0189   Epoch: 11   Global Step: 140520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:27,090-Speed 2913.51 samples/sec   Loss 5.9056   LearningRate 0.0189   Epoch: 11   Global Step: 140530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:30,477-Speed 3024.13 samples/sec   Loss 6.0433   LearningRate 0.0189   Epoch: 11   Global Step: 140540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:33,893-Speed 2998.16 samples/sec   Loss 6.0337   LearningRate 0.0189   Epoch: 11   Global Step: 140550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:37,238-Speed 3062.73 samples/sec   Loss 5.9783   LearningRate 0.0189   Epoch: 11   Global Step: 140560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:40,710-Speed 2950.61 samples/sec   Loss 5.9667   LearningRate 0.0188   Epoch: 11   Global Step: 140570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:44,158-Speed 2970.07 samples/sec   Loss 6.0795   LearningRate 0.0188   Epoch: 11   Global Step: 140580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:47,492-Speed 3072.28 samples/sec   Loss 6.0601   LearningRate 0.0188   Epoch: 11   Global Step: 140590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:50,848-Speed 3051.79 samples/sec   Loss 6.0215   LearningRate 0.0188   Epoch: 11   Global Step: 140600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:54,184-Speed 3070.99 samples/sec   Loss 6.0858   LearningRate 0.0188   Epoch: 11   Global Step: 140610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:01:57,577-Speed 3018.96 samples/sec   Loss 5.9791   LearningRate 0.0188   Epoch: 11   Global Step: 140620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:02:00,983-Speed 3007.11 samples/sec   Loss 6.0107   LearningRate 0.0188   Epoch: 11   Global Step: 140630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:02:04,395-Speed 3002.44 samples/sec   Loss 6.0459   LearningRate 0.0188   Epoch: 11   Global Step: 140640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:02:07,826-Speed 2984.83 samples/sec   Loss 6.0485   LearningRate 0.0188   Epoch: 11   Global Step: 140650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:02:11,186-Speed 3048.60 samples/sec   Loss 6.0900   LearningRate 0.0188   Epoch: 11   Global Step: 140660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:02:14,676-Speed 2935.35 samples/sec   Loss 6.0285   LearningRate 0.0188   Epoch: 11   Global Step: 140670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:02:18,079-Speed 3009.96 samples/sec   Loss 6.1089   LearningRate 0.0188   Epoch: 11   Global Step: 140680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:02:21,442-Speed 3045.42 samples/sec   Loss 5.9742   LearningRate 0.0188   Epoch: 11   Global Step: 140690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:02:24,874-Speed 2985.15 samples/sec   Loss 6.0248   LearningRate 0.0188   Epoch: 11   Global Step: 140700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:02:28,316-Speed 2975.52 samples/sec   Loss 6.0981   LearningRate 0.0188   Epoch: 11   Global Step: 140710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:02:31,702-Speed 3025.30 samples/sec   Loss 6.0532   LearningRate 0.0188   Epoch: 11   Global Step: 140720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:02:35,042-Speed 3066.59 samples/sec   Loss 6.2298   LearningRate 0.0188   Epoch: 11   Global Step: 140730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:02:38,403-Speed 3047.74 samples/sec   Loss 6.0948   LearningRate 0.0188   Epoch: 11   Global Step: 140740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:02:41,801-Speed 3013.74 samples/sec   Loss 5.9655   LearningRate 0.0188   Epoch: 11   Global Step: 140750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:02:45,161-Speed 3049.15 samples/sec   Loss 6.0704   LearningRate 0.0188   Epoch: 11   Global Step: 140760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:02:48,520-Speed 3049.19 samples/sec   Loss 6.0493   LearningRate 0.0188   Epoch: 11   Global Step: 140770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:02:51,896-Speed 3034.46 samples/sec   Loss 6.0584   LearningRate 0.0188   Epoch: 11   Global Step: 140780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:02:55,289-Speed 3018.92 samples/sec   Loss 6.0689   LearningRate 0.0188   Epoch: 11   Global Step: 140790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:02:58,666-Speed 3032.60 samples/sec   Loss 6.1000   LearningRate 0.0188   Epoch: 11   Global Step: 140800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:03:02,056-Speed 3021.79 samples/sec   Loss 6.0292   LearningRate 0.0188   Epoch: 11   Global Step: 140810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:03:05,401-Speed 3062.14 samples/sec   Loss 6.0161   LearningRate 0.0188   Epoch: 11   Global Step: 140820   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:08,750-Speed 3059.10 samples/sec   Loss 6.0101   LearningRate 0.0188   Epoch: 11   Global Step: 140830   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:12,059-Speed 3094.94 samples/sec   Loss 5.9320   LearningRate 0.0188   Epoch: 11   Global Step: 140840   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:15,413-Speed 3054.05 samples/sec   Loss 6.1624   LearningRate 0.0188   Epoch: 11   Global Step: 140850   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:18,819-Speed 3007.34 samples/sec   Loss 6.0544   LearningRate 0.0187   Epoch: 11   Global Step: 140860   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:22,230-Speed 3002.76 samples/sec   Loss 6.0456   LearningRate 0.0187   Epoch: 11   Global Step: 140870   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:25,586-Speed 3052.30 samples/sec   Loss 6.0853   LearningRate 0.0187   Epoch: 11   Global Step: 140880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:28,929-Speed 3063.80 samples/sec   Loss 6.0772   LearningRate 0.0187   Epoch: 11   Global Step: 140890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:32,267-Speed 3068.90 samples/sec   Loss 6.1586   LearningRate 0.0187   Epoch: 11   Global Step: 140900   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:35,636-Speed 3040.36 samples/sec   Loss 6.1444   LearningRate 0.0187   Epoch: 11   Global Step: 140910   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:39,003-Speed 3042.59 samples/sec   Loss 6.0616   LearningRate 0.0187   Epoch: 11   Global Step: 140920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:42,352-Speed 3057.90 samples/sec   Loss 6.0052   LearningRate 0.0187   Epoch: 11   Global Step: 140930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:45,779-Speed 2989.33 samples/sec   Loss 6.1328   LearningRate 0.0187   Epoch: 11   Global Step: 140940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:49,203-Speed 2991.69 samples/sec   Loss 6.0153   LearningRate 0.0187   Epoch: 11   Global Step: 140950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:52,598-Speed 3016.39 samples/sec   Loss 5.9689   LearningRate 0.0187   Epoch: 11   Global Step: 140960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:56,015-Speed 2997.52 samples/sec   Loss 6.0513   LearningRate 0.0187   Epoch: 11   Global Step: 140970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:03:59,397-Speed 3028.96 samples/sec   Loss 6.0328   LearningRate 0.0187   Epoch: 11   Global Step: 140980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:04:02,827-Speed 2986.92 samples/sec   Loss 6.1596   LearningRate 0.0187   Epoch: 11   Global Step: 140990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:04:06,291-Speed 2956.91 samples/sec   Loss 6.0768   LearningRate 0.0187   Epoch: 11   Global Step: 141000   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:04:09,661-Speed 3039.06 samples/sec   Loss 6.0564   LearningRate 0.0187   Epoch: 11   Global Step: 141010   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:04:13,117-Speed 2963.68 samples/sec   Loss 6.1069   LearningRate 0.0187   Epoch: 11   Global Step: 141020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:04:16,511-Speed 3018.71 samples/sec   Loss 6.1503   LearningRate 0.0187   Epoch: 11   Global Step: 141030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:04:19,880-Speed 3040.44 samples/sec   Loss 6.0768   LearningRate 0.0187   Epoch: 11   Global Step: 141040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:04:23,235-Speed 3052.73 samples/sec   Loss 6.0050   LearningRate 0.0187   Epoch: 11   Global Step: 141050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:04:26,622-Speed 3024.41 samples/sec   Loss 6.1126   LearningRate 0.0187   Epoch: 11   Global Step: 141060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:04:30,047-Speed 2990.86 samples/sec   Loss 6.0153   LearningRate 0.0187   Epoch: 11   Global Step: 141070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:04:33,421-Speed 3036.05 samples/sec   Loss 6.0740   LearningRate 0.0187   Epoch: 11   Global Step: 141080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:04:36,820-Speed 3013.79 samples/sec   Loss 6.0493   LearningRate 0.0187   Epoch: 11   Global Step: 141090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:04:40,150-Speed 3075.46 samples/sec   Loss 6.0226   LearningRate 0.0187   Epoch: 11   Global Step: 141100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:04:43,575-Speed 2990.80 samples/sec   Loss 6.0200   LearningRate 0.0187   Epoch: 11   Global Step: 141110   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:04:46,982-Speed 3006.31 samples/sec   Loss 5.9732   LearningRate 0.0187   Epoch: 11   Global Step: 141120   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:04:50,381-Speed 3013.82 samples/sec   Loss 6.1581   LearningRate 0.0187   Epoch: 11   Global Step: 141130   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:04:53,736-Speed 3052.79 samples/sec   Loss 6.0002   LearningRate 0.0186   Epoch: 11   Global Step: 141140   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:04:57,062-Speed 3079.59 samples/sec   Loss 5.9862   LearningRate 0.0186   Epoch: 11   Global Step: 141150   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:05:00,478-Speed 2998.78 samples/sec   Loss 6.1454   LearningRate 0.0186   Epoch: 11   Global Step: 141160   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:05:03,896-Speed 2995.96 samples/sec   Loss 6.1187   LearningRate 0.0186   Epoch: 11   Global Step: 141170   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:05:07,246-Speed 3057.67 samples/sec   Loss 5.9969   LearningRate 0.0186   Epoch: 11   Global Step: 141180   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:05:10,587-Speed 3066.45 samples/sec   Loss 6.0654   LearningRate 0.0186   Epoch: 11   Global Step: 141190   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:05:13,977-Speed 3020.86 samples/sec   Loss 6.0693   LearningRate 0.0186   Epoch: 11   Global Step: 141200   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:05:17,371-Speed 3018.53 samples/sec   Loss 6.0651   LearningRate 0.0186   Epoch: 11   Global Step: 141210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:05:20,832-Speed 2959.46 samples/sec   Loss 6.0717   LearningRate 0.0186   Epoch: 11   Global Step: 141220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:05:24,192-Speed 3048.96 samples/sec   Loss 6.0469   LearningRate 0.0186   Epoch: 11   Global Step: 141230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:05:27,568-Speed 3033.51 samples/sec   Loss 6.1408   LearningRate 0.0186   Epoch: 11   Global Step: 141240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:05:30,955-Speed 3024.44 samples/sec   Loss 5.9482   LearningRate 0.0186   Epoch: 11   Global Step: 141250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:05:34,281-Speed 3079.05 samples/sec   Loss 6.0549   LearningRate 0.0186   Epoch: 11   Global Step: 141260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:05:37,728-Speed 2972.15 samples/sec   Loss 6.1253   LearningRate 0.0186   Epoch: 11   Global Step: 141270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:05:41,087-Speed 3049.47 samples/sec   Loss 6.0372   LearningRate 0.0186   Epoch: 11   Global Step: 141280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:05:44,409-Speed 3082.52 samples/sec   Loss 6.0208   LearningRate 0.0186   Epoch: 11   Global Step: 141290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:05:47,814-Speed 3008.50 samples/sec   Loss 6.0932   LearningRate 0.0186   Epoch: 11   Global Step: 141300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:05:51,229-Speed 2999.60 samples/sec   Loss 6.0918   LearningRate 0.0186   Epoch: 11   Global Step: 141310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:05:54,661-Speed 2984.54 samples/sec   Loss 6.1111   LearningRate 0.0186   Epoch: 11   Global Step: 141320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:05:57,989-Speed 3077.32 samples/sec   Loss 5.9630   LearningRate 0.0186   Epoch: 11   Global Step: 141330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:01,410-Speed 2994.14 samples/sec   Loss 6.1713   LearningRate 0.0186   Epoch: 11   Global Step: 141340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:04,798-Speed 3023.42 samples/sec   Loss 5.9923   LearningRate 0.0186   Epoch: 11   Global Step: 141350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:08,165-Speed 3042.75 samples/sec   Loss 6.1996   LearningRate 0.0186   Epoch: 11   Global Step: 141360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:11,637-Speed 2950.06 samples/sec   Loss 6.0527   LearningRate 0.0186   Epoch: 11   Global Step: 141370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:15,070-Speed 2983.50 samples/sec   Loss 6.0804   LearningRate 0.0186   Epoch: 11   Global Step: 141380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:18,472-Speed 3010.55 samples/sec   Loss 6.0792   LearningRate 0.0186   Epoch: 11   Global Step: 141390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:21,883-Speed 3002.93 samples/sec   Loss 6.1615   LearningRate 0.0186   Epoch: 11   Global Step: 141400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:25,232-Speed 3058.82 samples/sec   Loss 6.0133   LearningRate 0.0186   Epoch: 11   Global Step: 141410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:28,667-Speed 2982.64 samples/sec   Loss 6.1059   LearningRate 0.0186   Epoch: 11   Global Step: 141420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:32,072-Speed 3008.34 samples/sec   Loss 6.0868   LearningRate 0.0185   Epoch: 11   Global Step: 141430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:35,436-Speed 3044.83 samples/sec   Loss 6.0616   LearningRate 0.0185   Epoch: 11   Global Step: 141440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:38,773-Speed 3070.41 samples/sec   Loss 6.0101   LearningRate 0.0185   Epoch: 11   Global Step: 141450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:42,185-Speed 3002.35 samples/sec   Loss 6.1666   LearningRate 0.0185   Epoch: 11   Global Step: 141460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:45,589-Speed 3008.41 samples/sec   Loss 6.1293   LearningRate 0.0185   Epoch: 11   Global Step: 141470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:48,929-Speed 3067.34 samples/sec   Loss 6.1280   LearningRate 0.0185   Epoch: 11   Global Step: 141480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:52,296-Speed 3041.90 samples/sec   Loss 6.0803   LearningRate 0.0185   Epoch: 11   Global Step: 141490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:55,715-Speed 2996.13 samples/sec   Loss 6.0958   LearningRate 0.0185   Epoch: 11   Global Step: 141500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:06:59,058-Speed 3063.85 samples/sec   Loss 6.0465   LearningRate 0.0185   Epoch: 11   Global Step: 141510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:07:02,444-Speed 3025.01 samples/sec   Loss 6.0898   LearningRate 0.0185   Epoch: 11   Global Step: 141520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:07:05,787-Speed 3064.14 samples/sec   Loss 5.9964   LearningRate 0.0185   Epoch: 11   Global Step: 141530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:07:09,157-Speed 3039.52 samples/sec   Loss 6.0365   LearningRate 0.0185   Epoch: 11   Global Step: 141540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:07:12,570-Speed 3001.02 samples/sec   Loss 6.1297   LearningRate 0.0185   Epoch: 11   Global Step: 141550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:07:15,960-Speed 3021.33 samples/sec   Loss 6.1287   LearningRate 0.0185   Epoch: 11   Global Step: 141560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:07:19,363-Speed 3010.65 samples/sec   Loss 6.0587   LearningRate 0.0185   Epoch: 11   Global Step: 141570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:07:22,730-Speed 3041.99 samples/sec   Loss 6.1326   LearningRate 0.0185   Epoch: 11   Global Step: 141580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:07:26,044-Speed 3091.00 samples/sec   Loss 6.1838   LearningRate 0.0185   Epoch: 11   Global Step: 141590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:07:29,482-Speed 2979.12 samples/sec   Loss 6.1521   LearningRate 0.0185   Epoch: 11   Global Step: 141600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:07:32,810-Speed 3078.37 samples/sec   Loss 6.0908   LearningRate 0.0185   Epoch: 11   Global Step: 141610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:07:36,161-Speed 3056.31 samples/sec   Loss 6.1520   LearningRate 0.0185   Epoch: 11   Global Step: 141620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:07:39,565-Speed 3009.13 samples/sec   Loss 6.1228   LearningRate 0.0185   Epoch: 11   Global Step: 141630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:07:42,930-Speed 3044.49 samples/sec   Loss 6.1343   LearningRate 0.0185   Epoch: 11   Global Step: 141640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:07:46,249-Speed 3085.41 samples/sec   Loss 6.0623   LearningRate 0.0185   Epoch: 11   Global Step: 141650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:07:49,627-Speed 3032.88 samples/sec   Loss 6.1802   LearningRate 0.0185   Epoch: 11   Global Step: 141660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:07:53,033-Speed 3007.14 samples/sec   Loss 6.1495   LearningRate 0.0185   Epoch: 11   Global Step: 141670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:07:56,470-Speed 2980.32 samples/sec   Loss 6.0112   LearningRate 0.0185   Epoch: 11   Global Step: 141680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:07:59,810-Speed 3066.48 samples/sec   Loss 6.1871   LearningRate 0.0185   Epoch: 11   Global Step: 141690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:08:03,202-Speed 3019.62 samples/sec   Loss 6.0990   LearningRate 0.0185   Epoch: 11   Global Step: 141700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:08:06,617-Speed 2999.90 samples/sec   Loss 6.0872   LearningRate 0.0185   Epoch: 11   Global Step: 141710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:08:10,073-Speed 2963.62 samples/sec   Loss 6.0399   LearningRate 0.0184   Epoch: 11   Global Step: 141720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:08:13,499-Speed 2990.01 samples/sec   Loss 6.1295   LearningRate 0.0184   Epoch: 11   Global Step: 141730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:08:16,942-Speed 2974.34 samples/sec   Loss 6.2241   LearningRate 0.0184   Epoch: 11   Global Step: 141740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:08:20,344-Speed 3011.52 samples/sec   Loss 6.1116   LearningRate 0.0184   Epoch: 11   Global Step: 141750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:08:23,695-Speed 3056.58 samples/sec   Loss 6.0613   LearningRate 0.0184   Epoch: 11   Global Step: 141760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:08:27,077-Speed 3028.69 samples/sec   Loss 6.1695   LearningRate 0.0184   Epoch: 11   Global Step: 141770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:08:30,444-Speed 3042.69 samples/sec   Loss 6.0984   LearningRate 0.0184   Epoch: 11   Global Step: 141780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:08:33,780-Speed 3069.81 samples/sec   Loss 6.2294   LearningRate 0.0184   Epoch: 11   Global Step: 141790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:08:37,189-Speed 3004.37 samples/sec   Loss 6.1062   LearningRate 0.0184   Epoch: 11   Global Step: 141800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:08:40,504-Speed 3090.68 samples/sec   Loss 6.0262   LearningRate 0.0184   Epoch: 11   Global Step: 141810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:08:43,869-Speed 3043.77 samples/sec   Loss 6.1394   LearningRate 0.0184   Epoch: 11   Global Step: 141820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:08:47,231-Speed 3046.92 samples/sec   Loss 6.1997   LearningRate 0.0184   Epoch: 11   Global Step: 141830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:08:50,650-Speed 2995.44 samples/sec   Loss 5.9460   LearningRate 0.0184   Epoch: 11   Global Step: 141840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:08:53,963-Speed 3092.36 samples/sec   Loss 6.1688   LearningRate 0.0184   Epoch: 11   Global Step: 141850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:08:57,329-Speed 3042.55 samples/sec   Loss 6.0305   LearningRate 0.0184   Epoch: 11   Global Step: 141860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:09:00,695-Speed 3043.96 samples/sec   Loss 6.1646   LearningRate 0.0184   Epoch: 11   Global Step: 141870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:09:04,115-Speed 2994.80 samples/sec   Loss 6.1699   LearningRate 0.0184   Epoch: 11   Global Step: 141880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:09:07,624-Speed 2918.89 samples/sec   Loss 6.0187   LearningRate 0.0184   Epoch: 11   Global Step: 141890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:09:11,043-Speed 2995.02 samples/sec   Loss 6.1555   LearningRate 0.0184   Epoch: 11   Global Step: 141900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:09:14,435-Speed 3020.51 samples/sec   Loss 6.2413   LearningRate 0.0184   Epoch: 11   Global Step: 141910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:09:17,806-Speed 3038.83 samples/sec   Loss 6.1054   LearningRate 0.0184   Epoch: 11   Global Step: 141920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:09:21,142-Speed 3070.37 samples/sec   Loss 6.0916   LearningRate 0.0184   Epoch: 11   Global Step: 141930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:09:24,596-Speed 2965.62 samples/sec   Loss 6.1536   LearningRate 0.0184   Epoch: 11   Global Step: 141940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:09:27,960-Speed 3045.28 samples/sec   Loss 6.1383   LearningRate 0.0184   Epoch: 11   Global Step: 141950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:09:31,405-Speed 2973.27 samples/sec   Loss 6.0103   LearningRate 0.0184   Epoch: 11   Global Step: 141960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:09:34,794-Speed 3022.08 samples/sec   Loss 5.9830   LearningRate 0.0184   Epoch: 11   Global Step: 141970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:09:38,210-Speed 2999.17 samples/sec   Loss 6.1330   LearningRate 0.0184   Epoch: 11   Global Step: 141980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:09:41,553-Speed 3063.60 samples/sec   Loss 6.0916   LearningRate 0.0184   Epoch: 11   Global Step: 141990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:09:44,951-Speed 3014.44 samples/sec   Loss 6.1640   LearningRate 0.0184   Epoch: 11   Global Step: 142000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:09:48,272-Speed 3084.81 samples/sec   Loss 6.1242   LearningRate 0.0183   Epoch: 11   Global Step: 142010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:09:51,626-Speed 3054.08 samples/sec   Loss 6.2336   LearningRate 0.0183   Epoch: 11   Global Step: 142020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:09:55,021-Speed 3016.44 samples/sec   Loss 6.1720   LearningRate 0.0183   Epoch: 11   Global Step: 142030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:09:58,350-Speed 3077.59 samples/sec   Loss 6.1150   LearningRate 0.0183   Epoch: 11   Global Step: 142040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:10:01,726-Speed 3034.08 samples/sec   Loss 6.1678   LearningRate 0.0183   Epoch: 11   Global Step: 142050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:10:05,187-Speed 2959.66 samples/sec   Loss 6.0554   LearningRate 0.0183   Epoch: 11   Global Step: 142060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:10:08,590-Speed 3009.60 samples/sec   Loss 6.1229   LearningRate 0.0183   Epoch: 11   Global Step: 142070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:10:11,967-Speed 3033.90 samples/sec   Loss 6.1347   LearningRate 0.0183   Epoch: 11   Global Step: 142080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:10:15,297-Speed 3075.85 samples/sec   Loss 6.1054   LearningRate 0.0183   Epoch: 11   Global Step: 142090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:10:18,779-Speed 2941.34 samples/sec   Loss 6.1310   LearningRate 0.0183   Epoch: 11   Global Step: 142100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:10:22,112-Speed 3074.08 samples/sec   Loss 6.0116   LearningRate 0.0183   Epoch: 11   Global Step: 142110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:10:25,433-Speed 3083.50 samples/sec   Loss 6.1626   LearningRate 0.0183   Epoch: 11   Global Step: 142120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:10:28,828-Speed 3017.76 samples/sec   Loss 6.1385   LearningRate 0.0183   Epoch: 11   Global Step: 142130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:10:32,219-Speed 3020.60 samples/sec   Loss 6.0508   LearningRate 0.0183   Epoch: 11   Global Step: 142140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:10:35,608-Speed 3022.56 samples/sec   Loss 6.0412   LearningRate 0.0183   Epoch: 11   Global Step: 142150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:10:39,084-Speed 2946.22 samples/sec   Loss 6.0526   LearningRate 0.0183   Epoch: 11   Global Step: 142160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:10:42,474-Speed 3021.78 samples/sec   Loss 6.1290   LearningRate 0.0183   Epoch: 11   Global Step: 142170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:10:45,929-Speed 2964.44 samples/sec   Loss 6.1032   LearningRate 0.0183   Epoch: 11   Global Step: 142180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:10:49,325-Speed 3016.23 samples/sec   Loss 6.0732   LearningRate 0.0183   Epoch: 11   Global Step: 142190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:10:52,652-Speed 3079.09 samples/sec   Loss 6.0714   LearningRate 0.0183   Epoch: 11   Global Step: 142200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:10:56,026-Speed 3035.53 samples/sec   Loss 6.1562   LearningRate 0.0183   Epoch: 11   Global Step: 142210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:10:59,397-Speed 3039.10 samples/sec   Loss 6.1150   LearningRate 0.0183   Epoch: 11   Global Step: 142220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:11:02,758-Speed 3047.18 samples/sec   Loss 6.1289   LearningRate 0.0183   Epoch: 11   Global Step: 142230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:11:06,107-Speed 3058.73 samples/sec   Loss 6.1458   LearningRate 0.0183   Epoch: 11   Global Step: 142240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:11:09,526-Speed 2996.27 samples/sec   Loss 6.1265   LearningRate 0.0183   Epoch: 11   Global Step: 142250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:11:12,927-Speed 3011.22 samples/sec   Loss 6.1862   LearningRate 0.0183   Epoch: 11   Global Step: 142260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:11:16,331-Speed 3009.00 samples/sec   Loss 6.1303   LearningRate 0.0183   Epoch: 11   Global Step: 142270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:11:19,701-Speed 3039.53 samples/sec   Loss 6.1427   LearningRate 0.0183   Epoch: 11   Global Step: 142280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:11:23,056-Speed 3053.68 samples/sec   Loss 6.1247   LearningRate 0.0183   Epoch: 11   Global Step: 142290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:11:26,494-Speed 2980.11 samples/sec   Loss 6.1614   LearningRate 0.0182   Epoch: 11   Global Step: 142300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:11:29,935-Speed 2977.02 samples/sec   Loss 6.0247   LearningRate 0.0182   Epoch: 11   Global Step: 142310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:11:33,403-Speed 2953.70 samples/sec   Loss 6.1596   LearningRate 0.0182   Epoch: 11   Global Step: 142320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:11:36,812-Speed 3004.08 samples/sec   Loss 6.1555   LearningRate 0.0182   Epoch: 11   Global Step: 142330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:11:40,198-Speed 3025.78 samples/sec   Loss 6.1807   LearningRate 0.0182   Epoch: 11   Global Step: 142340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:11:43,543-Speed 3061.58 samples/sec   Loss 6.1541   LearningRate 0.0182   Epoch: 11   Global Step: 142350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:11:46,895-Speed 3056.08 samples/sec   Loss 6.1832   LearningRate 0.0182   Epoch: 11   Global Step: 142360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:11:50,273-Speed 3032.21 samples/sec   Loss 6.2350   LearningRate 0.0182   Epoch: 11   Global Step: 142370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:11:53,656-Speed 3027.74 samples/sec   Loss 6.1247   LearningRate 0.0182   Epoch: 11   Global Step: 142380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:11:57,050-Speed 3018.06 samples/sec   Loss 5.9820   LearningRate 0.0182   Epoch: 11   Global Step: 142390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:12:00,418-Speed 3040.85 samples/sec   Loss 6.1091   LearningRate 0.0182   Epoch: 11   Global Step: 142400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:12:03,817-Speed 3014.05 samples/sec   Loss 6.0950   LearningRate 0.0182   Epoch: 11   Global Step: 142410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:12:07,234-Speed 2997.02 samples/sec   Loss 6.1340   LearningRate 0.0182   Epoch: 11   Global Step: 142420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:12:10,653-Speed 2996.08 samples/sec   Loss 6.1951   LearningRate 0.0182   Epoch: 11   Global Step: 142430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:12:14,006-Speed 3054.84 samples/sec   Loss 6.0403   LearningRate 0.0182   Epoch: 11   Global Step: 142440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:12:17,365-Speed 3049.95 samples/sec   Loss 6.2114   LearningRate 0.0182   Epoch: 11   Global Step: 142450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:12:20,763-Speed 3013.73 samples/sec   Loss 6.1107   LearningRate 0.0182   Epoch: 11   Global Step: 142460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:12:24,112-Speed 3058.83 samples/sec   Loss 6.1785   LearningRate 0.0182   Epoch: 11   Global Step: 142470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:12:27,498-Speed 3024.99 samples/sec   Loss 6.0504   LearningRate 0.0182   Epoch: 11   Global Step: 142480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:12:30,890-Speed 3019.99 samples/sec   Loss 6.0724   LearningRate 0.0182   Epoch: 11   Global Step: 142490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:12:34,296-Speed 3008.07 samples/sec   Loss 6.1225   LearningRate 0.0182   Epoch: 11   Global Step: 142500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:12:37,803-Speed 2920.50 samples/sec   Loss 6.2142   LearningRate 0.0182   Epoch: 11   Global Step: 142510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:12:41,181-Speed 3032.50 samples/sec   Loss 6.1859   LearningRate 0.0182   Epoch: 11   Global Step: 142520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:12:44,515-Speed 3072.12 samples/sec   Loss 6.1230   LearningRate 0.0182   Epoch: 11   Global Step: 142530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:12:47,883-Speed 3041.03 samples/sec   Loss 6.1615   LearningRate 0.0182   Epoch: 11   Global Step: 142540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:12:51,277-Speed 3018.39 samples/sec   Loss 6.1300   LearningRate 0.0182   Epoch: 11   Global Step: 142550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:12:54,690-Speed 3000.90 samples/sec   Loss 6.0600   LearningRate 0.0182   Epoch: 11   Global Step: 142560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:12:58,055-Speed 3044.26 samples/sec   Loss 6.0538   LearningRate 0.0182   Epoch: 11   Global Step: 142570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:13:01,462-Speed 3006.27 samples/sec   Loss 6.0266   LearningRate 0.0182   Epoch: 11   Global Step: 142580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:13:04,821-Speed 3049.93 samples/sec   Loss 6.1835   LearningRate 0.0181   Epoch: 11   Global Step: 142590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:13:08,201-Speed 3030.46 samples/sec   Loss 6.1501   LearningRate 0.0181   Epoch: 11   Global Step: 142600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:13:11,563-Speed 3046.36 samples/sec   Loss 6.1584   LearningRate 0.0181   Epoch: 11   Global Step: 142610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:13:14,935-Speed 3037.46 samples/sec   Loss 6.0627   LearningRate 0.0181   Epoch: 11   Global Step: 142620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:13:18,395-Speed 2960.56 samples/sec   Loss 6.1212   LearningRate 0.0181   Epoch: 11   Global Step: 142630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:13:21,766-Speed 3038.72 samples/sec   Loss 6.1552   LearningRate 0.0181   Epoch: 11   Global Step: 142640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:13:25,272-Speed 2921.80 samples/sec   Loss 6.1244   LearningRate 0.0181   Epoch: 11   Global Step: 142650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:13:28,644-Speed 3038.59 samples/sec   Loss 6.2256   LearningRate 0.0181   Epoch: 11   Global Step: 142660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:13:32,134-Speed 2935.10 samples/sec   Loss 6.2136   LearningRate 0.0181   Epoch: 11   Global Step: 142670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:13:35,499-Speed 3044.27 samples/sec   Loss 6.0287   LearningRate 0.0181   Epoch: 11   Global Step: 142680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:13:38,865-Speed 3042.59 samples/sec   Loss 6.0978   LearningRate 0.0181   Epoch: 11   Global Step: 142690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:13:42,248-Speed 3028.16 samples/sec   Loss 6.0791   LearningRate 0.0181   Epoch: 11   Global Step: 142700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:13:45,600-Speed 3055.62 samples/sec   Loss 6.1654   LearningRate 0.0181   Epoch: 11   Global Step: 142710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:13:48,928-Speed 3077.85 samples/sec   Loss 6.1347   LearningRate 0.0181   Epoch: 11   Global Step: 142720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:13:52,256-Speed 3078.55 samples/sec   Loss 6.1979   LearningRate 0.0181   Epoch: 11   Global Step: 142730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:13:55,587-Speed 3074.93 samples/sec   Loss 6.0577   LearningRate 0.0181   Epoch: 11   Global Step: 142740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:13:58,917-Speed 3076.48 samples/sec   Loss 6.1782   LearningRate 0.0181   Epoch: 11   Global Step: 142750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:14:02,324-Speed 3005.84 samples/sec   Loss 6.1346   LearningRate 0.0181   Epoch: 11   Global Step: 142760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:14:05,742-Speed 2997.10 samples/sec   Loss 6.1154   LearningRate 0.0181   Epoch: 11   Global Step: 142770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:14:09,141-Speed 3013.76 samples/sec   Loss 6.2545   LearningRate 0.0181   Epoch: 11   Global Step: 142780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:14:12,547-Speed 3007.24 samples/sec   Loss 6.2276   LearningRate 0.0181   Epoch: 11   Global Step: 142790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:14:15,961-Speed 3000.17 samples/sec   Loss 6.1738   LearningRate 0.0181   Epoch: 11   Global Step: 142800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:14:19,398-Speed 2980.69 samples/sec   Loss 6.0923   LearningRate 0.0181   Epoch: 11   Global Step: 142810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:14:22,785-Speed 3024.33 samples/sec   Loss 6.1310   LearningRate 0.0181   Epoch: 11   Global Step: 142820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:14:26,171-Speed 3025.07 samples/sec   Loss 6.1642   LearningRate 0.0181   Epoch: 11   Global Step: 142830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:14:29,511-Speed 3066.86 samples/sec   Loss 6.1926   LearningRate 0.0181   Epoch: 11   Global Step: 142840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:14:32,992-Speed 2942.21 samples/sec   Loss 6.0883   LearningRate 0.0181   Epoch: 11   Global Step: 142850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:14:36,384-Speed 3020.42 samples/sec   Loss 6.0626   LearningRate 0.0181   Epoch: 11   Global Step: 142860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-27 15:14:39,790-Speed 3007.16 samples/sec   Loss 6.1119   LearningRate 0.0181   Epoch: 11   Global Step: 142870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:14:43,105-Speed 3090.04 samples/sec   Loss 6.1572   LearningRate 0.0180   Epoch: 11   Global Step: 142880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:14:46,521-Speed 2998.49 samples/sec   Loss 6.0650   LearningRate 0.0180   Epoch: 11   Global Step: 142890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:14:49,936-Speed 2998.85 samples/sec   Loss 6.1178   LearningRate 0.0180   Epoch: 11   Global Step: 142900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:14:53,379-Speed 2974.75 samples/sec   Loss 6.1197   LearningRate 0.0180   Epoch: 11   Global Step: 142910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:14:56,764-Speed 3026.40 samples/sec   Loss 6.1820   LearningRate 0.0180   Epoch: 11   Global Step: 142920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:15:00,219-Speed 2963.92 samples/sec   Loss 6.0644   LearningRate 0.0180   Epoch: 11   Global Step: 142930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:15:03,588-Speed 3042.46 samples/sec   Loss 6.0937   LearningRate 0.0180   Epoch: 11   Global Step: 142940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:15:06,986-Speed 3014.87 samples/sec   Loss 6.1076   LearningRate 0.0180   Epoch: 11   Global Step: 142950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:15:10,432-Speed 2973.01 samples/sec   Loss 6.1233   LearningRate 0.0180   Epoch: 11   Global Step: 142960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:15:13,951-Speed 2910.36 samples/sec   Loss 6.1093   LearningRate 0.0180   Epoch: 11   Global Step: 142970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:15:17,348-Speed 3015.74 samples/sec   Loss 6.0601   LearningRate 0.0180   Epoch: 11   Global Step: 142980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:15:20,737-Speed 3021.92 samples/sec   Loss 6.1915   LearningRate 0.0180   Epoch: 11   Global Step: 142990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:15:24,117-Speed 3030.36 samples/sec   Loss 6.0990   LearningRate 0.0180   Epoch: 11   Global Step: 143000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:15:27,488-Speed 3038.53 samples/sec   Loss 6.0170   LearningRate 0.0180   Epoch: 11   Global Step: 143010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:15:30,837-Speed 3058.60 samples/sec   Loss 6.1649   LearningRate 0.0180   Epoch: 11   Global Step: 143020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:15:34,357-Speed 2910.40 samples/sec   Loss 6.1573   LearningRate 0.0180   Epoch: 11   Global Step: 143030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:15:37,769-Speed 3001.51 samples/sec   Loss 6.0951   LearningRate 0.0180   Epoch: 11   Global Step: 143040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:15:41,177-Speed 3006.14 samples/sec   Loss 6.0862   LearningRate 0.0180   Epoch: 11   Global Step: 143050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:15:44,541-Speed 3044.38 samples/sec   Loss 6.0352   LearningRate 0.0180   Epoch: 11   Global Step: 143060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:15:47,964-Speed 2992.87 samples/sec   Loss 6.1128   LearningRate 0.0180   Epoch: 11   Global Step: 143070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:15:51,427-Speed 2957.68 samples/sec   Loss 6.1566   LearningRate 0.0180   Epoch: 11   Global Step: 143080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:15:54,800-Speed 3036.53 samples/sec   Loss 6.0710   LearningRate 0.0180   Epoch: 11   Global Step: 143090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:15:58,190-Speed 3021.67 samples/sec   Loss 6.1544   LearningRate 0.0180   Epoch: 11   Global Step: 143100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:16:01,557-Speed 3042.65 samples/sec   Loss 6.0794   LearningRate 0.0180   Epoch: 11   Global Step: 143110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:16:04,884-Speed 3078.09 samples/sec   Loss 6.1074   LearningRate 0.0180   Epoch: 11   Global Step: 143120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:16:08,238-Speed 3053.97 samples/sec   Loss 6.1731   LearningRate 0.0180   Epoch: 11   Global Step: 143130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:16:11,663-Speed 2990.91 samples/sec   Loss 6.0770   LearningRate 0.0180   Epoch: 11   Global Step: 143140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:16:15,013-Speed 3057.96 samples/sec   Loss 6.0408   LearningRate 0.0180   Epoch: 11   Global Step: 143150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:16:18,434-Speed 2994.17 samples/sec   Loss 6.1571   LearningRate 0.0180   Epoch: 11   Global Step: 143160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:16:21,792-Speed 3050.36 samples/sec   Loss 6.2189   LearningRate 0.0180   Epoch: 11   Global Step: 143170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:16:25,206-Speed 2999.87 samples/sec   Loss 6.1163   LearningRate 0.0179   Epoch: 11   Global Step: 143180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:16:28,642-Speed 2981.25 samples/sec   Loss 6.0935   LearningRate 0.0179   Epoch: 11   Global Step: 143190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:16:32,158-Speed 2913.62 samples/sec   Loss 6.0558   LearningRate 0.0179   Epoch: 11   Global Step: 143200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:16:35,562-Speed 3008.79 samples/sec   Loss 6.1647   LearningRate 0.0179   Epoch: 11   Global Step: 143210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:16:38,989-Speed 2989.57 samples/sec   Loss 6.1905   LearningRate 0.0179   Epoch: 11   Global Step: 143220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:16:42,385-Speed 3016.04 samples/sec   Loss 6.0592   LearningRate 0.0179   Epoch: 11   Global Step: 143230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:16:45,833-Speed 2970.89 samples/sec   Loss 6.0080   LearningRate 0.0179   Epoch: 11   Global Step: 143240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:16:49,236-Speed 3009.92 samples/sec   Loss 5.9754   LearningRate 0.0179   Epoch: 11   Global Step: 143250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:16:52,652-Speed 2998.89 samples/sec   Loss 6.2130   LearningRate 0.0179   Epoch: 11   Global Step: 143260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:16:56,058-Speed 3007.13 samples/sec   Loss 6.1420   LearningRate 0.0179   Epoch: 11   Global Step: 143270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:16:59,448-Speed 3020.86 samples/sec   Loss 6.2127   LearningRate 0.0179   Epoch: 11   Global Step: 143280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:17:02,828-Speed 3031.34 samples/sec   Loss 5.9544   LearningRate 0.0179   Epoch: 11   Global Step: 143290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:17:06,135-Speed 3096.55 samples/sec   Loss 5.9856   LearningRate 0.0179   Epoch: 11   Global Step: 143300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:17:09,535-Speed 3013.14 samples/sec   Loss 6.1821   LearningRate 0.0179   Epoch: 11   Global Step: 143310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:17:12,928-Speed 3019.06 samples/sec   Loss 6.2780   LearningRate 0.0179   Epoch: 11   Global Step: 143320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:17:16,357-Speed 2986.81 samples/sec   Loss 6.1158   LearningRate 0.0179   Epoch: 11   Global Step: 143330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:17:19,723-Speed 3042.99 samples/sec   Loss 6.0664   LearningRate 0.0179   Epoch: 11   Global Step: 143340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:17:23,192-Speed 2953.07 samples/sec   Loss 6.1529   LearningRate 0.0179   Epoch: 11   Global Step: 143350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:17:26,550-Speed 3049.70 samples/sec   Loss 6.0534   LearningRate 0.0179   Epoch: 11   Global Step: 143360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:17:30,032-Speed 2942.16 samples/sec   Loss 6.0686   LearningRate 0.0179   Epoch: 11   Global Step: 143370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:17:33,496-Speed 2956.34 samples/sec   Loss 6.1614   LearningRate 0.0179   Epoch: 11   Global Step: 143380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:17:36,833-Speed 3069.55 samples/sec   Loss 5.9606   LearningRate 0.0179   Epoch: 11   Global Step: 143390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:17:40,156-Speed 3083.22 samples/sec   Loss 6.1085   LearningRate 0.0179   Epoch: 11   Global Step: 143400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:17:43,593-Speed 2979.28 samples/sec   Loss 6.0327   LearningRate 0.0179   Epoch: 11   Global Step: 143410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:17:47,021-Speed 2988.45 samples/sec   Loss 6.1119   LearningRate 0.0179   Epoch: 11   Global Step: 143420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:17:50,458-Speed 2980.10 samples/sec   Loss 6.1108   LearningRate 0.0179   Epoch: 11   Global Step: 143430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:17:53,882-Speed 2991.67 samples/sec   Loss 6.1281   LearningRate 0.0179   Epoch: 11   Global Step: 143440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:17:57,316-Speed 2982.84 samples/sec   Loss 6.1248   LearningRate 0.0179   Epoch: 11   Global Step: 143450   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:00,728-Speed 3002.68 samples/sec   Loss 6.1451   LearningRate 0.0179   Epoch: 11   Global Step: 143460   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:04,162-Speed 2982.10 samples/sec   Loss 6.1687   LearningRate 0.0178   Epoch: 11   Global Step: 143470   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:07,613-Speed 2969.36 samples/sec   Loss 6.2183   LearningRate 0.0178   Epoch: 11   Global Step: 143480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:11,017-Speed 3010.24 samples/sec   Loss 6.2209   LearningRate 0.0178   Epoch: 11   Global Step: 143490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:14,381-Speed 3044.85 samples/sec   Loss 6.1551   LearningRate 0.0178   Epoch: 11   Global Step: 143500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:17,817-Speed 2981.25 samples/sec   Loss 6.0487   LearningRate 0.0178   Epoch: 11   Global Step: 143510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:21,215-Speed 3014.43 samples/sec   Loss 6.0909   LearningRate 0.0178   Epoch: 11   Global Step: 143520   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:24,584-Speed 3039.95 samples/sec   Loss 6.1581   LearningRate 0.0178   Epoch: 11   Global Step: 143530   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:27,955-Speed 3038.76 samples/sec   Loss 6.1221   LearningRate 0.0178   Epoch: 11   Global Step: 143540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:18:31,322-Speed 3042.25 samples/sec   Loss 6.1118   LearningRate 0.0178   Epoch: 11   Global Step: 143550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:34,704-Speed 3028.65 samples/sec   Loss 6.1340   LearningRate 0.0178   Epoch: 11   Global Step: 143560   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:38,070-Speed 3043.38 samples/sec   Loss 6.1221   LearningRate 0.0178   Epoch: 11   Global Step: 143570   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:41,551-Speed 2943.40 samples/sec   Loss 6.1576   LearningRate 0.0178   Epoch: 11   Global Step: 143580   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:44,955-Speed 3008.57 samples/sec   Loss 6.1339   LearningRate 0.0178   Epoch: 11   Global Step: 143590   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:48,345-Speed 3022.12 samples/sec   Loss 6.1173   LearningRate 0.0178   Epoch: 11   Global Step: 143600   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:51,718-Speed 3036.92 samples/sec   Loss 6.1569   LearningRate 0.0178   Epoch: 11   Global Step: 143610   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:55,094-Speed 3033.49 samples/sec   Loss 6.0399   LearningRate 0.0178   Epoch: 11   Global Step: 143620   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:18:58,497-Speed 3010.74 samples/sec   Loss 6.0656   LearningRate 0.0178   Epoch: 11   Global Step: 143630   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:19:01,871-Speed 3035.10 samples/sec   Loss 6.1737   LearningRate 0.0178   Epoch: 11   Global Step: 143640   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:19:05,218-Speed 3060.60 samples/sec   Loss 6.0755   LearningRate 0.0178   Epoch: 11   Global Step: 143650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:08,618-Speed 3012.31 samples/sec   Loss 6.2056   LearningRate 0.0178   Epoch: 11   Global Step: 143660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:11,997-Speed 3031.35 samples/sec   Loss 6.0731   LearningRate 0.0178   Epoch: 11   Global Step: 143670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:15,444-Speed 2971.40 samples/sec   Loss 6.0676   LearningRate 0.0178   Epoch: 11   Global Step: 143680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:18,807-Speed 3045.73 samples/sec   Loss 6.1984   LearningRate 0.0178   Epoch: 11   Global Step: 143690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:22,175-Speed 3041.19 samples/sec   Loss 6.1151   LearningRate 0.0178   Epoch: 11   Global Step: 143700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:25,587-Speed 3002.59 samples/sec   Loss 6.1242   LearningRate 0.0178   Epoch: 11   Global Step: 143710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:29,017-Speed 2986.04 samples/sec   Loss 6.0129   LearningRate 0.0178   Epoch: 11   Global Step: 143720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:32,429-Speed 3002.44 samples/sec   Loss 6.1052   LearningRate 0.0178   Epoch: 11   Global Step: 143730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:35,810-Speed 3029.56 samples/sec   Loss 6.0774   LearningRate 0.0178   Epoch: 11   Global Step: 143740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:39,159-Speed 3058.72 samples/sec   Loss 6.1345   LearningRate 0.0178   Epoch: 11   Global Step: 143750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:19:42,515-Speed 3051.72 samples/sec   Loss 6.1412   LearningRate 0.0177   Epoch: 11   Global Step: 143760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:45,839-Speed 3081.25 samples/sec   Loss 6.1144   LearningRate 0.0177   Epoch: 11   Global Step: 143770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:49,188-Speed 3059.11 samples/sec   Loss 6.0489   LearningRate 0.0177   Epoch: 11   Global Step: 143780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:52,595-Speed 3006.76 samples/sec   Loss 6.0812   LearningRate 0.0177   Epoch: 11   Global Step: 143790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:55,973-Speed 3032.48 samples/sec   Loss 5.9831   LearningRate 0.0177   Epoch: 11   Global Step: 143800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:19:59,374-Speed 3011.08 samples/sec   Loss 6.1247   LearningRate 0.0177   Epoch: 11   Global Step: 143810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:20:02,727-Speed 3055.22 samples/sec   Loss 6.1295   LearningRate 0.0177   Epoch: 11   Global Step: 143820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:20:06,074-Speed 3060.65 samples/sec   Loss 6.1079   LearningRate 0.0177   Epoch: 11   Global Step: 143830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:20:09,487-Speed 3001.38 samples/sec   Loss 5.9878   LearningRate 0.0177   Epoch: 11   Global Step: 143840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:20:12,905-Speed 2995.82 samples/sec   Loss 6.0575   LearningRate 0.0177   Epoch: 11   Global Step: 143850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:20:16,260-Speed 3052.88 samples/sec   Loss 6.2380   LearningRate 0.0177   Epoch: 11   Global Step: 143860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:20:19,695-Speed 2982.77 samples/sec   Loss 6.0806   LearningRate 0.0177   Epoch: 11   Global Step: 143870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:20:23,104-Speed 3004.65 samples/sec   Loss 6.0115   LearningRate 0.0177   Epoch: 11   Global Step: 143880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:20:26,437-Speed 3072.52 samples/sec   Loss 6.1164   LearningRate 0.0177   Epoch: 11   Global Step: 143890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:20:29,772-Speed 3071.40 samples/sec   Loss 6.1106   LearningRate 0.0177   Epoch: 11   Global Step: 143900   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:20:33,144-Speed 3037.63 samples/sec   Loss 6.1035   LearningRate 0.0177   Epoch: 11   Global Step: 143910   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:20:36,569-Speed 2990.87 samples/sec   Loss 6.1050   LearningRate 0.0177   Epoch: 11   Global Step: 143920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:20:40,038-Speed 2952.76 samples/sec   Loss 6.1154   LearningRate 0.0177   Epoch: 11   Global Step: 143930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:20:43,463-Speed 2990.25 samples/sec   Loss 6.0155   LearningRate 0.0177   Epoch: 11   Global Step: 143940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:20:46,881-Speed 2996.58 samples/sec   Loss 6.1956   LearningRate 0.0177   Epoch: 11   Global Step: 143950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:20:50,307-Speed 2989.68 samples/sec   Loss 5.9488   LearningRate 0.0177   Epoch: 11   Global Step: 143960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:20:53,829-Speed 2908.71 samples/sec   Loss 5.9668   LearningRate 0.0177   Epoch: 11   Global Step: 143970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:20:57,222-Speed 3018.16 samples/sec   Loss 6.0766   LearningRate 0.0177   Epoch: 11   Global Step: 143980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:21:00,644-Speed 2993.56 samples/sec   Loss 6.1871   LearningRate 0.0177   Epoch: 11   Global Step: 143990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:21:04,036-Speed 3019.99 samples/sec   Loss 6.1307   LearningRate 0.0177   Epoch: 11   Global Step: 144000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:21:07,398-Speed 3046.50 samples/sec   Loss 6.1146   LearningRate 0.0177   Epoch: 11   Global Step: 144010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:21:10,774-Speed 3035.27 samples/sec   Loss 6.1116   LearningRate 0.0177   Epoch: 11   Global Step: 144020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:21:14,142-Speed 3041.54 samples/sec   Loss 6.1395   LearningRate 0.0177   Epoch: 11   Global Step: 144030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:21:17,578-Speed 2981.29 samples/sec   Loss 6.0950   LearningRate 0.0177   Epoch: 11   Global Step: 144040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:21:20,980-Speed 3010.33 samples/sec   Loss 6.2880   LearningRate 0.0177   Epoch: 11   Global Step: 144050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:21:24,351-Speed 3039.06 samples/sec   Loss 6.1512   LearningRate 0.0176   Epoch: 11   Global Step: 144060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:21:27,760-Speed 3004.81 samples/sec   Loss 6.1656   LearningRate 0.0176   Epoch: 11   Global Step: 144070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:21:31,152-Speed 3019.30 samples/sec   Loss 6.0700   LearningRate 0.0176   Epoch: 11   Global Step: 144080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:21:34,492-Speed 3067.07 samples/sec   Loss 6.1390   LearningRate 0.0176   Epoch: 11   Global Step: 144090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:21:37,874-Speed 3028.48 samples/sec   Loss 6.1743   LearningRate 0.0176   Epoch: 11   Global Step: 144100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:21:41,359-Speed 2939.17 samples/sec   Loss 6.1442   LearningRate 0.0176   Epoch: 11   Global Step: 144110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:21:44,702-Speed 3064.47 samples/sec   Loss 6.1876   LearningRate 0.0176   Epoch: 11   Global Step: 144120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:21:48,068-Speed 3043.08 samples/sec   Loss 6.0151   LearningRate 0.0176   Epoch: 11   Global Step: 144130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:21:51,393-Speed 3080.72 samples/sec   Loss 6.0769   LearningRate 0.0176   Epoch: 11   Global Step: 144140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:21:54,791-Speed 3014.04 samples/sec   Loss 6.1003   LearningRate 0.0176   Epoch: 11   Global Step: 144150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:21:58,142-Speed 3056.95 samples/sec   Loss 6.0483   LearningRate 0.0176   Epoch: 11   Global Step: 144160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:22:01,448-Speed 3098.11 samples/sec   Loss 6.1207   LearningRate 0.0176   Epoch: 11   Global Step: 144170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:22:04,801-Speed 3054.68 samples/sec   Loss 6.2128   LearningRate 0.0176   Epoch: 11   Global Step: 144180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:22:08,134-Speed 3073.00 samples/sec   Loss 6.0535   LearningRate 0.0176   Epoch: 11   Global Step: 144190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:22:11,530-Speed 3016.40 samples/sec   Loss 6.1555   LearningRate 0.0176   Epoch: 11   Global Step: 144200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:22:14,833-Speed 3100.93 samples/sec   Loss 6.1185   LearningRate 0.0176   Epoch: 11   Global Step: 144210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:22:18,180-Speed 3060.30 samples/sec   Loss 5.9618   LearningRate 0.0176   Epoch: 11   Global Step: 144220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:22:21,586-Speed 3007.80 samples/sec   Loss 6.0833   LearningRate 0.0176   Epoch: 11   Global Step: 144230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:22:24,956-Speed 3039.44 samples/sec   Loss 6.1455   LearningRate 0.0176   Epoch: 11   Global Step: 144240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:22:28,328-Speed 3037.63 samples/sec   Loss 5.9486   LearningRate 0.0176   Epoch: 11   Global Step: 144250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:22:31,669-Speed 3065.22 samples/sec   Loss 6.0515   LearningRate 0.0176   Epoch: 11   Global Step: 144260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:22:35,026-Speed 3051.03 samples/sec   Loss 6.0305   LearningRate 0.0176   Epoch: 11   Global Step: 144270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:22:38,393-Speed 3042.96 samples/sec   Loss 6.1256   LearningRate 0.0176   Epoch: 11   Global Step: 144280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:22:41,781-Speed 3023.00 samples/sec   Loss 6.0967   LearningRate 0.0176   Epoch: 11   Global Step: 144290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:22:45,186-Speed 3008.62 samples/sec   Loss 6.1014   LearningRate 0.0176   Epoch: 11   Global Step: 144300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:22:48,649-Speed 2957.63 samples/sec   Loss 6.1764   LearningRate 0.0176   Epoch: 11   Global Step: 144310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:22:51,999-Speed 3058.41 samples/sec   Loss 6.1014   LearningRate 0.0176   Epoch: 11   Global Step: 144320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:22:55,355-Speed 3051.49 samples/sec   Loss 6.1039   LearningRate 0.0176   Epoch: 11   Global Step: 144330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:22:58,735-Speed 3030.91 samples/sec   Loss 6.0365   LearningRate 0.0176   Epoch: 11   Global Step: 144340   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:23:02,099-Speed 3044.85 samples/sec   Loss 6.2005   LearningRate 0.0176   Epoch: 11   Global Step: 144350   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:23:05,465-Speed 3042.19 samples/sec   Loss 6.1148   LearningRate 0.0175   Epoch: 11   Global Step: 144360   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:23:08,869-Speed 3009.49 samples/sec   Loss 6.0549   LearningRate 0.0175   Epoch: 11   Global Step: 144370   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:23:12,290-Speed 2994.43 samples/sec   Loss 6.0498   LearningRate 0.0175   Epoch: 11   Global Step: 144380   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:23:15,724-Speed 2982.47 samples/sec   Loss 6.1395   LearningRate 0.0175   Epoch: 11   Global Step: 144390   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:23:19,162-Speed 2979.44 samples/sec   Loss 6.0972   LearningRate 0.0175   Epoch: 11   Global Step: 144400   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:23:22,609-Speed 2971.25 samples/sec   Loss 6.0532   LearningRate 0.0175   Epoch: 11   Global Step: 144410   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:23:26,131-Speed 2908.58 samples/sec   Loss 6.2647   LearningRate 0.0175   Epoch: 11   Global Step: 144420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:23:29,592-Speed 2959.65 samples/sec   Loss 6.1990   LearningRate 0.0175   Epoch: 11   Global Step: 144430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:23:33,048-Speed 2963.64 samples/sec   Loss 6.0957   LearningRate 0.0175   Epoch: 11   Global Step: 144440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:23:36,534-Speed 2937.82 samples/sec   Loss 6.2183   LearningRate 0.0175   Epoch: 11   Global Step: 144450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:23:39,964-Speed 2986.59 samples/sec   Loss 6.2580   LearningRate 0.0175   Epoch: 11   Global Step: 144460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:23:43,312-Speed 3059.19 samples/sec   Loss 6.1225   LearningRate 0.0175   Epoch: 11   Global Step: 144470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:23:46,661-Speed 3058.82 samples/sec   Loss 6.0741   LearningRate 0.0175   Epoch: 11   Global Step: 144480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:23:50,080-Speed 2996.40 samples/sec   Loss 5.9617   LearningRate 0.0175   Epoch: 11   Global Step: 144490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:23:53,502-Speed 2992.78 samples/sec   Loss 6.0607   LearningRate 0.0175   Epoch: 11   Global Step: 144500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:23:56,887-Speed 3026.18 samples/sec   Loss 6.2096   LearningRate 0.0175   Epoch: 11   Global Step: 144510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:24:00,233-Speed 3061.10 samples/sec   Loss 6.1432   LearningRate 0.0175   Epoch: 11   Global Step: 144520   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:24:03,554-Speed 3084.70 samples/sec   Loss 6.1145   LearningRate 0.0175   Epoch: 11   Global Step: 144530   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:24:06,970-Speed 2998.56 samples/sec   Loss 6.1294   LearningRate 0.0175   Epoch: 11   Global Step: 144540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:24:10,351-Speed 3028.77 samples/sec   Loss 6.1957   LearningRate 0.0175   Epoch: 11   Global Step: 144550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:24:13,759-Speed 3006.24 samples/sec   Loss 6.2183   LearningRate 0.0175   Epoch: 11   Global Step: 144560   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:24:17,180-Speed 2994.40 samples/sec   Loss 6.1735   LearningRate 0.0175   Epoch: 11   Global Step: 144570   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:24:20,617-Speed 2983.27 samples/sec   Loss 6.0379   LearningRate 0.0175   Epoch: 11   Global Step: 144580   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:24:24,031-Speed 3000.04 samples/sec   Loss 6.1898   LearningRate 0.0175   Epoch: 11   Global Step: 144590   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:24:27,444-Speed 3001.26 samples/sec   Loss 6.1304   LearningRate 0.0175   Epoch: 11   Global Step: 144600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:24:30,789-Speed 3062.04 samples/sec   Loss 6.0607   LearningRate 0.0175   Epoch: 11   Global Step: 144610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:24:34,154-Speed 3043.85 samples/sec   Loss 6.2271   LearningRate 0.0175   Epoch: 11   Global Step: 144620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:24:37,567-Speed 3001.08 samples/sec   Loss 5.9870   LearningRate 0.0175   Epoch: 11   Global Step: 144630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:24:40,920-Speed 3055.55 samples/sec   Loss 6.0429   LearningRate 0.0175   Epoch: 11   Global Step: 144640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:24:44,326-Speed 3007.28 samples/sec   Loss 6.0630   LearningRate 0.0174   Epoch: 11   Global Step: 144650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:24:47,701-Speed 3034.18 samples/sec   Loss 6.0494   LearningRate 0.0174   Epoch: 11   Global Step: 144660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:24:51,085-Speed 3027.94 samples/sec   Loss 6.0506   LearningRate 0.0174   Epoch: 11   Global Step: 144670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:24:54,460-Speed 3034.00 samples/sec   Loss 6.1433   LearningRate 0.0174   Epoch: 11   Global Step: 144680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:24:57,821-Speed 3047.80 samples/sec   Loss 6.0978   LearningRate 0.0174   Epoch: 11   Global Step: 144690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:25:01,174-Speed 3055.05 samples/sec   Loss 6.1221   LearningRate 0.0174   Epoch: 11   Global Step: 144700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:25:04,494-Speed 3085.13 samples/sec   Loss 6.1418   LearningRate 0.0174   Epoch: 11   Global Step: 144710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:25:07,805-Speed 3093.04 samples/sec   Loss 6.1355   LearningRate 0.0174   Epoch: 11   Global Step: 144720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:25:11,219-Speed 3000.36 samples/sec   Loss 6.0817   LearningRate 0.0174   Epoch: 11   Global Step: 144730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:25:14,541-Speed 3083.96 samples/sec   Loss 6.1988   LearningRate 0.0174   Epoch: 11   Global Step: 144740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:25:17,906-Speed 3043.43 samples/sec   Loss 6.0723   LearningRate 0.0174   Epoch: 11   Global Step: 144750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:25:21,267-Speed 3048.09 samples/sec   Loss 6.0851   LearningRate 0.0174   Epoch: 11   Global Step: 144760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:25:24,589-Speed 3083.11 samples/sec   Loss 5.9865   LearningRate 0.0174   Epoch: 11   Global Step: 144770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:25:27,948-Speed 3049.77 samples/sec   Loss 6.0357   LearningRate 0.0174   Epoch: 11   Global Step: 144780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:25:31,276-Speed 3077.87 samples/sec   Loss 6.0520   LearningRate 0.0174   Epoch: 11   Global Step: 144790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:25:34,616-Speed 3066.73 samples/sec   Loss 6.1203   LearningRate 0.0174   Epoch: 11   Global Step: 144800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:25:38,042-Speed 2989.60 samples/sec   Loss 6.1433   LearningRate 0.0174   Epoch: 11   Global Step: 144810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:25:41,479-Speed 2980.20 samples/sec   Loss 6.0590   LearningRate 0.0174   Epoch: 11   Global Step: 144820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:25:44,907-Speed 2988.48 samples/sec   Loss 6.0823   LearningRate 0.0174   Epoch: 11   Global Step: 144830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:25:48,345-Speed 2979.49 samples/sec   Loss 6.1311   LearningRate 0.0174   Epoch: 11   Global Step: 144840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:25:51,697-Speed 3057.00 samples/sec   Loss 6.0705   LearningRate 0.0174   Epoch: 11   Global Step: 144850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:25:55,104-Speed 3006.15 samples/sec   Loss 6.0132   LearningRate 0.0174   Epoch: 11   Global Step: 144860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:25:58,478-Speed 3035.86 samples/sec   Loss 6.1428   LearningRate 0.0174   Epoch: 11   Global Step: 144870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:01,883-Speed 3008.35 samples/sec   Loss 6.1258   LearningRate 0.0174   Epoch: 11   Global Step: 144880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:05,277-Speed 3017.82 samples/sec   Loss 6.0725   LearningRate 0.0174   Epoch: 11   Global Step: 144890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:08,647-Speed 3039.12 samples/sec   Loss 6.0919   LearningRate 0.0174   Epoch: 11   Global Step: 144900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:12,058-Speed 3002.60 samples/sec   Loss 6.0710   LearningRate 0.0174   Epoch: 11   Global Step: 144910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:15,434-Speed 3034.46 samples/sec   Loss 6.0872   LearningRate 0.0174   Epoch: 11   Global Step: 144920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:18,834-Speed 3012.91 samples/sec   Loss 6.0573   LearningRate 0.0174   Epoch: 11   Global Step: 144930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:22,155-Speed 3083.81 samples/sec   Loss 6.0486   LearningRate 0.0174   Epoch: 11   Global Step: 144940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:26:25,512-Speed 3050.87 samples/sec   Loss 6.1282   LearningRate 0.0173   Epoch: 11   Global Step: 144950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:26:28,932-Speed 2994.97 samples/sec   Loss 5.9848   LearningRate 0.0173   Epoch: 11   Global Step: 144960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:26:32,269-Speed 3069.90 samples/sec   Loss 6.1550   LearningRate 0.0173   Epoch: 11   Global Step: 144970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:35,701-Speed 2984.75 samples/sec   Loss 6.0636   LearningRate 0.0173   Epoch: 11   Global Step: 144980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:39,065-Speed 3045.07 samples/sec   Loss 6.1059   LearningRate 0.0173   Epoch: 11   Global Step: 144990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:42,403-Speed 3068.63 samples/sec   Loss 6.1357   LearningRate 0.0173   Epoch: 11   Global Step: 145000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:45,747-Speed 3062.51 samples/sec   Loss 6.0247   LearningRate 0.0173   Epoch: 11   Global Step: 145010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:49,074-Speed 3078.55 samples/sec   Loss 6.0352   LearningRate 0.0173   Epoch: 11   Global Step: 145020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:52,428-Speed 3054.62 samples/sec   Loss 6.0734   LearningRate 0.0173   Epoch: 11   Global Step: 145030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:55,868-Speed 2977.54 samples/sec   Loss 6.0555   LearningRate 0.0173   Epoch: 11   Global Step: 145040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:26:59,237-Speed 3040.47 samples/sec   Loss 6.0610   LearningRate 0.0173   Epoch: 11   Global Step: 145050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:27:02,628-Speed 3020.41 samples/sec   Loss 6.0776   LearningRate 0.0173   Epoch: 11   Global Step: 145060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:27:06,022-Speed 3018.13 samples/sec   Loss 6.0187   LearningRate 0.0173   Epoch: 11   Global Step: 145070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:27:09,469-Speed 2972.05 samples/sec   Loss 6.2080   LearningRate 0.0173   Epoch: 11   Global Step: 145080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:27:12,796-Speed 3078.93 samples/sec   Loss 6.0336   LearningRate 0.0173   Epoch: 11   Global Step: 145090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:27:16,171-Speed 3034.60 samples/sec   Loss 6.0871   LearningRate 0.0173   Epoch: 11   Global Step: 145100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:27:19,578-Speed 3006.28 samples/sec   Loss 6.1175   LearningRate 0.0173   Epoch: 11   Global Step: 145110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:27:23,093-Speed 2914.13 samples/sec   Loss 6.1783   LearningRate 0.0173   Epoch: 11   Global Step: 145120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:27:26,428-Speed 3071.01 samples/sec   Loss 6.1547   LearningRate 0.0173   Epoch: 11   Global Step: 145130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:27:29,775-Speed 3060.37 samples/sec   Loss 6.1568   LearningRate 0.0173   Epoch: 11   Global Step: 145140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:27:33,189-Speed 3000.35 samples/sec   Loss 6.1934   LearningRate 0.0173   Epoch: 11   Global Step: 145150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:27:36,552-Speed 3046.25 samples/sec   Loss 6.1117   LearningRate 0.0173   Epoch: 11   Global Step: 145160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:27:39,978-Speed 2988.98 samples/sec   Loss 6.1765   LearningRate 0.0173   Epoch: 11   Global Step: 145170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:27:43,383-Speed 3009.33 samples/sec   Loss 6.0125   LearningRate 0.0173   Epoch: 11   Global Step: 145180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:27:46,751-Speed 3040.88 samples/sec   Loss 6.1400   LearningRate 0.0173   Epoch: 11   Global Step: 145190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:27:50,169-Speed 2997.75 samples/sec   Loss 6.0534   LearningRate 0.0173   Epoch: 11   Global Step: 145200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:27:53,543-Speed 3035.24 samples/sec   Loss 6.0550   LearningRate 0.0173   Epoch: 11   Global Step: 145210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:27:56,911-Speed 3041.73 samples/sec   Loss 6.0107   LearningRate 0.0173   Epoch: 11   Global Step: 145220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:00,241-Speed 3076.18 samples/sec   Loss 6.1006   LearningRate 0.0173   Epoch: 11   Global Step: 145230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:03,570-Speed 3076.80 samples/sec   Loss 6.0565   LearningRate 0.0173   Epoch: 11   Global Step: 145240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:06,988-Speed 2996.59 samples/sec   Loss 6.1443   LearningRate 0.0172   Epoch: 11   Global Step: 145250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:10,389-Speed 3012.07 samples/sec   Loss 5.9896   LearningRate 0.0172   Epoch: 11   Global Step: 145260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:13,749-Speed 3048.51 samples/sec   Loss 6.1093   LearningRate 0.0172   Epoch: 11   Global Step: 145270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:17,163-Speed 3000.07 samples/sec   Loss 6.0081   LearningRate 0.0172   Epoch: 11   Global Step: 145280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:20,538-Speed 3034.85 samples/sec   Loss 6.1635   LearningRate 0.0172   Epoch: 11   Global Step: 145290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:23,909-Speed 3038.42 samples/sec   Loss 6.0447   LearningRate 0.0172   Epoch: 11   Global Step: 145300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:27,243-Speed 3072.37 samples/sec   Loss 6.0046   LearningRate 0.0172   Epoch: 11   Global Step: 145310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:30,625-Speed 3028.42 samples/sec   Loss 6.0065   LearningRate 0.0172   Epoch: 11   Global Step: 145320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:34,024-Speed 3013.39 samples/sec   Loss 6.0525   LearningRate 0.0172   Epoch: 11   Global Step: 145330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:37,379-Speed 3053.34 samples/sec   Loss 6.0019   LearningRate 0.0172   Epoch: 11   Global Step: 145340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:40,775-Speed 3016.16 samples/sec   Loss 6.0413   LearningRate 0.0172   Epoch: 11   Global Step: 145350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:44,211-Speed 2981.15 samples/sec   Loss 6.1410   LearningRate 0.0172   Epoch: 11   Global Step: 145360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:28:47,648-Speed 2980.47 samples/sec   Loss 6.1107   LearningRate 0.0172   Epoch: 11   Global Step: 145370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:28:51,003-Speed 3053.08 samples/sec   Loss 6.0406   LearningRate 0.0172   Epoch: 11   Global Step: 145380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:28:54,372-Speed 3042.59 samples/sec   Loss 5.9460   LearningRate 0.0172   Epoch: 11   Global Step: 145390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:28:57,829-Speed 2962.75 samples/sec   Loss 6.1352   LearningRate 0.0172   Epoch: 11   Global Step: 145400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:01,219-Speed 3021.44 samples/sec   Loss 6.0164   LearningRate 0.0172   Epoch: 11   Global Step: 145410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:04,599-Speed 3030.97 samples/sec   Loss 6.0606   LearningRate 0.0172   Epoch: 11   Global Step: 145420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:07,920-Speed 3084.15 samples/sec   Loss 6.1181   LearningRate 0.0172   Epoch: 11   Global Step: 145430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:11,378-Speed 2962.08 samples/sec   Loss 5.9838   LearningRate 0.0172   Epoch: 11   Global Step: 145440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:14,711-Speed 3073.96 samples/sec   Loss 6.0630   LearningRate 0.0172   Epoch: 11   Global Step: 145450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:18,135-Speed 2991.15 samples/sec   Loss 6.0653   LearningRate 0.0172   Epoch: 11   Global Step: 145460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:21,582-Speed 2971.56 samples/sec   Loss 6.0486   LearningRate 0.0172   Epoch: 11   Global Step: 145470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:29:24,989-Speed 3006.18 samples/sec   Loss 6.1774   LearningRate 0.0172   Epoch: 11   Global Step: 145480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:28,400-Speed 3003.67 samples/sec   Loss 6.1281   LearningRate 0.0172   Epoch: 11   Global Step: 145490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:31,806-Speed 3007.34 samples/sec   Loss 6.1732   LearningRate 0.0172   Epoch: 11   Global Step: 145500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:35,264-Speed 2961.91 samples/sec   Loss 6.1249   LearningRate 0.0172   Epoch: 11   Global Step: 145510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:38,637-Speed 3037.11 samples/sec   Loss 6.0925   LearningRate 0.0172   Epoch: 11   Global Step: 145520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:42,024-Speed 3024.49 samples/sec   Loss 6.1094   LearningRate 0.0172   Epoch: 11   Global Step: 145530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:45,422-Speed 3014.40 samples/sec   Loss 6.0815   LearningRate 0.0172   Epoch: 11   Global Step: 145540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:48,852-Speed 2986.54 samples/sec   Loss 5.9962   LearningRate 0.0171   Epoch: 11   Global Step: 145550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:52,245-Speed 3018.15 samples/sec   Loss 6.0578   LearningRate 0.0171   Epoch: 11   Global Step: 145560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:55,654-Speed 3005.14 samples/sec   Loss 6.1184   LearningRate 0.0171   Epoch: 11   Global Step: 145570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:29:59,057-Speed 3010.17 samples/sec   Loss 6.1660   LearningRate 0.0171   Epoch: 11   Global Step: 145580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:30:02,413-Speed 3051.84 samples/sec   Loss 6.0484   LearningRate 0.0171   Epoch: 11   Global Step: 145590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:30:05,840-Speed 2989.12 samples/sec   Loss 6.0725   LearningRate 0.0171   Epoch: 11   Global Step: 145600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:30:09,200-Speed 3048.02 samples/sec   Loss 6.1447   LearningRate 0.0171   Epoch: 11   Global Step: 145610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:30:12,583-Speed 3028.53 samples/sec   Loss 6.0404   LearningRate 0.0171   Epoch: 11   Global Step: 145620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:30:15,959-Speed 3033.63 samples/sec   Loss 6.1058   LearningRate 0.0171   Epoch: 11   Global Step: 145630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:30:19,308-Speed 3058.77 samples/sec   Loss 6.0061   LearningRate 0.0171   Epoch: 11   Global Step: 145640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:30:22,702-Speed 3017.89 samples/sec   Loss 6.0327   LearningRate 0.0171   Epoch: 11   Global Step: 145650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:30:26,052-Speed 3057.92 samples/sec   Loss 6.0311   LearningRate 0.0171   Epoch: 11   Global Step: 145660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:30:29,440-Speed 3023.69 samples/sec   Loss 5.8724   LearningRate 0.0171   Epoch: 11   Global Step: 145670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:30:32,775-Speed 3071.04 samples/sec   Loss 6.1473   LearningRate 0.0171   Epoch: 11   Global Step: 145680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:30:36,136-Speed 3047.03 samples/sec   Loss 5.9815   LearningRate 0.0171   Epoch: 11   Global Step: 145690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:30:39,508-Speed 3038.37 samples/sec   Loss 6.0987   LearningRate 0.0171   Epoch: 11   Global Step: 145700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:30:42,927-Speed 2995.23 samples/sec   Loss 6.0395   LearningRate 0.0171   Epoch: 11   Global Step: 145710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:30:46,245-Speed 3087.23 samples/sec   Loss 6.0249   LearningRate 0.0171   Epoch: 11   Global Step: 145720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:30:49,587-Speed 3065.19 samples/sec   Loss 6.0603   LearningRate 0.0171   Epoch: 11   Global Step: 145730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:30:53,018-Speed 2985.91 samples/sec   Loss 6.0588   LearningRate 0.0171   Epoch: 11   Global Step: 145740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:30:56,331-Speed 3091.43 samples/sec   Loss 6.1412   LearningRate 0.0171   Epoch: 11   Global Step: 145750   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:30:59,672-Speed 3065.70 samples/sec   Loss 6.1552   LearningRate 0.0171   Epoch: 11   Global Step: 145760   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:31:03,048-Speed 3034.64 samples/sec   Loss 6.1127   LearningRate 0.0171   Epoch: 11   Global Step: 145770   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:31:06,462-Speed 2999.86 samples/sec   Loss 6.0209   LearningRate 0.0171   Epoch: 11   Global Step: 145780   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:31:09,782-Speed 3084.84 samples/sec   Loss 5.9978   LearningRate 0.0171   Epoch: 11   Global Step: 145790   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:31:13,193-Speed 3003.01 samples/sec   Loss 6.0958   LearningRate 0.0171   Epoch: 11   Global Step: 145800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:31:16,626-Speed 2984.27 samples/sec   Loss 6.0665   LearningRate 0.0171   Epoch: 11   Global Step: 145810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:31:20,084-Speed 2961.93 samples/sec   Loss 6.0521   LearningRate 0.0171   Epoch: 11   Global Step: 145820   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:31:23,483-Speed 3014.07 samples/sec   Loss 6.1786   LearningRate 0.0171   Epoch: 11   Global Step: 145830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:31:26,910-Speed 2988.40 samples/sec   Loss 6.0285   LearningRate 0.0171   Epoch: 11   Global Step: 145840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:31:30,264-Speed 3054.00 samples/sec   Loss 6.0736   LearningRate 0.0170   Epoch: 11   Global Step: 145850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:31:33,779-Speed 2914.95 samples/sec   Loss 6.0448   LearningRate 0.0170   Epoch: 11   Global Step: 145860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:31:37,081-Speed 3102.63 samples/sec   Loss 6.0540   LearningRate 0.0170   Epoch: 11   Global Step: 145870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:31:40,511-Speed 2985.88 samples/sec   Loss 6.0173   LearningRate 0.0170   Epoch: 11   Global Step: 145880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:31:43,857-Speed 3061.23 samples/sec   Loss 6.0645   LearningRate 0.0170   Epoch: 11   Global Step: 145890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:31:47,320-Speed 2957.79 samples/sec   Loss 6.0017   LearningRate 0.0170   Epoch: 11   Global Step: 145900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:31:50,769-Speed 2970.01 samples/sec   Loss 5.9940   LearningRate 0.0170   Epoch: 11   Global Step: 145910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:31:54,116-Speed 3060.67 samples/sec   Loss 6.0718   LearningRate 0.0170   Epoch: 11   Global Step: 145920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:31:57,565-Speed 2969.92 samples/sec   Loss 5.9767   LearningRate 0.0170   Epoch: 11   Global Step: 145930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:32:00,917-Speed 3055.94 samples/sec   Loss 6.0454   LearningRate 0.0170   Epoch: 11   Global Step: 145940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:32:04,314-Speed 3015.38 samples/sec   Loss 6.1114   LearningRate 0.0170   Epoch: 11   Global Step: 145950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:32:07,756-Speed 2976.04 samples/sec   Loss 6.0259   LearningRate 0.0170   Epoch: 11   Global Step: 145960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:32:11,224-Speed 2953.42 samples/sec   Loss 6.0571   LearningRate 0.0170   Epoch: 11   Global Step: 145970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:32:14,691-Speed 2954.05 samples/sec   Loss 6.0178   LearningRate 0.0170   Epoch: 11   Global Step: 145980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:32:18,100-Speed 3004.75 samples/sec   Loss 5.9854   LearningRate 0.0170   Epoch: 11   Global Step: 145990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:32:21,525-Speed 2991.19 samples/sec   Loss 6.0537   LearningRate 0.0170   Epoch: 11   Global Step: 146000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:32:24,882-Speed 3051.00 samples/sec   Loss 5.9832   LearningRate 0.0170   Epoch: 11   Global Step: 146010   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:32:28,280-Speed 3014.67 samples/sec   Loss 5.9861   LearningRate 0.0170   Epoch: 11   Global Step: 146020   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:32:31,694-Speed 2999.85 samples/sec   Loss 6.1673   LearningRate 0.0170   Epoch: 11   Global Step: 146030   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:32:35,061-Speed 3042.46 samples/sec   Loss 5.9956   LearningRate 0.0170   Epoch: 11   Global Step: 146040   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:32:38,390-Speed 3077.21 samples/sec   Loss 6.0543   LearningRate 0.0170   Epoch: 11   Global Step: 146050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:32:41,754-Speed 3044.24 samples/sec   Loss 6.0434   LearningRate 0.0170   Epoch: 11   Global Step: 146060   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:32:45,060-Speed 3098.53 samples/sec   Loss 6.0731   LearningRate 0.0170   Epoch: 11   Global Step: 146070   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:32:48,452-Speed 3019.64 samples/sec   Loss 5.9573   LearningRate 0.0170   Epoch: 11   Global Step: 146080   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:32:51,779-Speed 3078.51 samples/sec   Loss 6.0534   LearningRate 0.0170   Epoch: 11   Global Step: 146090   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:32:55,227-Speed 2971.48 samples/sec   Loss 6.0970   LearningRate 0.0170   Epoch: 11   Global Step: 146100   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:32:58,570-Speed 3063.48 samples/sec   Loss 6.0198   LearningRate 0.0170   Epoch: 11   Global Step: 146110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:33:01,928-Speed 3050.05 samples/sec   Loss 5.9693   LearningRate 0.0170   Epoch: 11   Global Step: 146120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:33:05,369-Speed 2977.24 samples/sec   Loss 6.0822   LearningRate 0.0170   Epoch: 11   Global Step: 146130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:33:08,770-Speed 3011.37 samples/sec   Loss 6.0764   LearningRate 0.0170   Epoch: 11   Global Step: 146140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:33:12,109-Speed 3068.35 samples/sec   Loss 6.0282   LearningRate 0.0169   Epoch: 11   Global Step: 146150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:33:15,575-Speed 2955.51 samples/sec   Loss 6.1045   LearningRate 0.0169   Epoch: 11   Global Step: 146160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:33:19,013-Speed 2979.01 samples/sec   Loss 5.9877   LearningRate 0.0169   Epoch: 11   Global Step: 146170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:33:22,369-Speed 3052.22 samples/sec   Loss 6.0232   LearningRate 0.0169   Epoch: 11   Global Step: 146180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:33:25,738-Speed 3039.98 samples/sec   Loss 6.0780   LearningRate 0.0169   Epoch: 11   Global Step: 146190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:33:29,124-Speed 3025.32 samples/sec   Loss 6.0846   LearningRate 0.0169   Epoch: 11   Global Step: 146200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:33:32,523-Speed 3013.34 samples/sec   Loss 5.9751   LearningRate 0.0169   Epoch: 11   Global Step: 146210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:33:35,904-Speed 3029.81 samples/sec   Loss 6.1199   LearningRate 0.0169   Epoch: 11   Global Step: 146220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:33:39,274-Speed 3039.39 samples/sec   Loss 6.0684   LearningRate 0.0169   Epoch: 11   Global Step: 146230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:33:42,655-Speed 3029.34 samples/sec   Loss 5.9903   LearningRate 0.0169   Epoch: 11   Global Step: 146240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:33:46,040-Speed 3026.43 samples/sec   Loss 5.9853   LearningRate 0.0169   Epoch: 11   Global Step: 146250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:33:49,375-Speed 3071.49 samples/sec   Loss 5.9659   LearningRate 0.0169   Epoch: 11   Global Step: 146260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:33:52,856-Speed 2942.63 samples/sec   Loss 6.0874   LearningRate 0.0169   Epoch: 11   Global Step: 146270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:33:56,344-Speed 2935.95 samples/sec   Loss 6.0138   LearningRate 0.0169   Epoch: 11   Global Step: 146280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:33:59,807-Speed 2958.40 samples/sec   Loss 6.0381   LearningRate 0.0169   Epoch: 11   Global Step: 146290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:03,174-Speed 3042.21 samples/sec   Loss 5.9620   LearningRate 0.0169   Epoch: 11   Global Step: 146300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:06,562-Speed 3023.08 samples/sec   Loss 6.0319   LearningRate 0.0169   Epoch: 11   Global Step: 146310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:09,970-Speed 3005.31 samples/sec   Loss 6.0246   LearningRate 0.0169   Epoch: 11   Global Step: 146320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:13,343-Speed 3037.11 samples/sec   Loss 6.0312   LearningRate 0.0169   Epoch: 11   Global Step: 146330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:16,670-Speed 3078.07 samples/sec   Loss 6.1115   LearningRate 0.0169   Epoch: 11   Global Step: 146340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:20,018-Speed 3059.48 samples/sec   Loss 5.9962   LearningRate 0.0169   Epoch: 11   Global Step: 146350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:34:23,375-Speed 3051.40 samples/sec   Loss 6.0812   LearningRate 0.0169   Epoch: 11   Global Step: 146360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:34:26,739-Speed 3044.51 samples/sec   Loss 6.0450   LearningRate 0.0169   Epoch: 11   Global Step: 146370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:30,162-Speed 2992.23 samples/sec   Loss 6.1035   LearningRate 0.0169   Epoch: 11   Global Step: 146380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:33,498-Speed 3071.07 samples/sec   Loss 5.9117   LearningRate 0.0169   Epoch: 11   Global Step: 146390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:36,827-Speed 3075.98 samples/sec   Loss 5.9728   LearningRate 0.0169   Epoch: 11   Global Step: 146400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:40,192-Speed 3043.85 samples/sec   Loss 6.0898   LearningRate 0.0169   Epoch: 11   Global Step: 146410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:43,554-Speed 3047.31 samples/sec   Loss 5.9258   LearningRate 0.0169   Epoch: 11   Global Step: 146420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:46,922-Speed 3041.32 samples/sec   Loss 6.0113   LearningRate 0.0169   Epoch: 11   Global Step: 146430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:50,373-Speed 2967.87 samples/sec   Loss 6.0620   LearningRate 0.0169   Epoch: 11   Global Step: 146440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:53,780-Speed 3006.75 samples/sec   Loss 6.1128   LearningRate 0.0168   Epoch: 11   Global Step: 146450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:34:57,172-Speed 3019.24 samples/sec   Loss 6.0012   LearningRate 0.0168   Epoch: 11   Global Step: 146460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:35:00,536-Speed 3045.13 samples/sec   Loss 6.0649   LearningRate 0.0168   Epoch: 11   Global Step: 146470   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:35:04,685-Speed 2468.03 samples/sec   Loss 6.1393   LearningRate 0.0168   Epoch: 11   Global Step: 146480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:35:08,100-Speed 2999.96 samples/sec   Loss 6.0820   LearningRate 0.0168   Epoch: 11   Global Step: 146490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:35:11,419-Speed 3086.38 samples/sec   Loss 6.0619   LearningRate 0.0168   Epoch: 11   Global Step: 146500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:35:14,770-Speed 3056.25 samples/sec   Loss 5.9499   LearningRate 0.0168   Epoch: 11   Global Step: 146510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:35:18,092-Speed 3083.23 samples/sec   Loss 6.0821   LearningRate 0.0168   Epoch: 11   Global Step: 146520   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:35:21,514-Speed 2993.17 samples/sec   Loss 6.0985   LearningRate 0.0168   Epoch: 11   Global Step: 146530   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:35:24,952-Speed 2979.60 samples/sec   Loss 6.0580   LearningRate 0.0168   Epoch: 11   Global Step: 146540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:35:28,340-Speed 3023.17 samples/sec   Loss 6.1284   LearningRate 0.0168   Epoch: 11   Global Step: 146550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:35:31,698-Speed 3050.61 samples/sec   Loss 6.0912   LearningRate 0.0168   Epoch: 11   Global Step: 146560   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:35:35,021-Speed 3082.11 samples/sec   Loss 6.0597   LearningRate 0.0168   Epoch: 11   Global Step: 146570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:35:38,359-Speed 3069.08 samples/sec   Loss 5.9574   LearningRate 0.0168   Epoch: 11   Global Step: 146580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:35:41,845-Speed 2938.16 samples/sec   Loss 5.9204   LearningRate 0.0168   Epoch: 11   Global Step: 146590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:35:45,234-Speed 3022.26 samples/sec   Loss 6.2070   LearningRate 0.0168   Epoch: 11   Global Step: 146600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:35:48,590-Speed 3052.17 samples/sec   Loss 6.1292   LearningRate 0.0168   Epoch: 11   Global Step: 146610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:35:51,948-Speed 3050.18 samples/sec   Loss 6.0506   LearningRate 0.0168   Epoch: 11   Global Step: 146620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:35:55,371-Speed 2992.46 samples/sec   Loss 6.0274   LearningRate 0.0168   Epoch: 11   Global Step: 146630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:35:58,706-Speed 3071.03 samples/sec   Loss 6.0327   LearningRate 0.0168   Epoch: 11   Global Step: 146640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:36:02,106-Speed 3013.37 samples/sec   Loss 6.0028   LearningRate 0.0168   Epoch: 11   Global Step: 146650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:36:05,416-Speed 3094.99 samples/sec   Loss 5.9069   LearningRate 0.0168   Epoch: 11   Global Step: 146660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:36:08,847-Speed 2985.46 samples/sec   Loss 6.0117   LearningRate 0.0168   Epoch: 11   Global Step: 146670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:36:12,188-Speed 3065.22 samples/sec   Loss 6.0329   LearningRate 0.0168   Epoch: 11   Global Step: 146680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:36:15,516-Speed 3078.10 samples/sec   Loss 6.0573   LearningRate 0.0168   Epoch: 11   Global Step: 146690   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:36:18,891-Speed 3035.01 samples/sec   Loss 5.9835   LearningRate 0.0168   Epoch: 11   Global Step: 146700   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:36:22,231-Speed 3066.40 samples/sec   Loss 6.1966   LearningRate 0.0168   Epoch: 11   Global Step: 146710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:36:25,652-Speed 2994.04 samples/sec   Loss 6.0566   LearningRate 0.0168   Epoch: 11   Global Step: 146720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:36:29,021-Speed 3040.56 samples/sec   Loss 5.9146   LearningRate 0.0168   Epoch: 11   Global Step: 146730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:36:32,363-Speed 3064.86 samples/sec   Loss 6.0130   LearningRate 0.0168   Epoch: 11   Global Step: 146740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:36:35,734-Speed 3038.59 samples/sec   Loss 6.1022   LearningRate 0.0167   Epoch: 11   Global Step: 146750   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:36:39,060-Speed 3079.19 samples/sec   Loss 6.0278   LearningRate 0.0167   Epoch: 11   Global Step: 146760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:36:42,449-Speed 3022.86 samples/sec   Loss 5.8971   LearningRate 0.0167   Epoch: 11   Global Step: 146770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:36:45,926-Speed 2945.79 samples/sec   Loss 6.0432   LearningRate 0.0167   Epoch: 11   Global Step: 146780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:36:49,244-Speed 3086.90 samples/sec   Loss 5.9677   LearningRate 0.0167   Epoch: 11   Global Step: 146790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:36:52,604-Speed 3048.23 samples/sec   Loss 5.9460   LearningRate 0.0167   Epoch: 11   Global Step: 146800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:36:55,943-Speed 3068.14 samples/sec   Loss 6.0159   LearningRate 0.0167   Epoch: 11   Global Step: 146810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:36:59,284-Speed 3065.70 samples/sec   Loss 5.9400   LearningRate 0.0167   Epoch: 11   Global Step: 146820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:37:02,707-Speed 2992.98 samples/sec   Loss 5.8866   LearningRate 0.0167   Epoch: 11   Global Step: 146830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:37:06,151-Speed 2973.96 samples/sec   Loss 6.0858   LearningRate 0.0167   Epoch: 11   Global Step: 146840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:37:09,517-Speed 3042.63 samples/sec   Loss 6.1207   LearningRate 0.0167   Epoch: 11   Global Step: 146850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:37:12,896-Speed 3031.54 samples/sec   Loss 5.9175   LearningRate 0.0167   Epoch: 11   Global Step: 146860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:37:16,282-Speed 3024.63 samples/sec   Loss 6.0277   LearningRate 0.0167   Epoch: 11   Global Step: 146870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:37:19,671-Speed 3022.72 samples/sec   Loss 5.9858   LearningRate 0.0167   Epoch: 11   Global Step: 146880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:37:23,078-Speed 3006.36 samples/sec   Loss 5.9226   LearningRate 0.0167   Epoch: 11   Global Step: 146890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:37:26,412-Speed 3072.76 samples/sec   Loss 5.9630   LearningRate 0.0167   Epoch: 11   Global Step: 146900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:37:29,785-Speed 3036.81 samples/sec   Loss 5.9368   LearningRate 0.0167   Epoch: 11   Global Step: 146910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:37:33,170-Speed 3025.57 samples/sec   Loss 5.9728   LearningRate 0.0167   Epoch: 11   Global Step: 146920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:37:36,527-Speed 3051.87 samples/sec   Loss 6.0841   LearningRate 0.0167   Epoch: 11   Global Step: 146930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:37:39,854-Speed 3078.29 samples/sec   Loss 6.0512   LearningRate 0.0167   Epoch: 11   Global Step: 146940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:37:43,245-Speed 3021.23 samples/sec   Loss 6.0490   LearningRate 0.0167   Epoch: 11   Global Step: 146950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:37:46,611-Speed 3042.86 samples/sec   Loss 5.9340   LearningRate 0.0167   Epoch: 11   Global Step: 146960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:37:50,107-Speed 2929.70 samples/sec   Loss 6.0676   LearningRate 0.0167   Epoch: 11   Global Step: 146970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:37:53,481-Speed 3035.97 samples/sec   Loss 5.9888   LearningRate 0.0167   Epoch: 11   Global Step: 146980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:37:56,828-Speed 3060.81 samples/sec   Loss 6.0403   LearningRate 0.0167   Epoch: 11   Global Step: 146990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:38:00,157-Speed 3076.36 samples/sec   Loss 6.1022   LearningRate 0.0167   Epoch: 11   Global Step: 147000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:38:03,511-Speed 3054.01 samples/sec   Loss 5.9274   LearningRate 0.0167   Epoch: 11   Global Step: 147010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:38:06,991-Speed 2944.21 samples/sec   Loss 6.0037   LearningRate 0.0167   Epoch: 11   Global Step: 147020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:38:10,440-Speed 2969.16 samples/sec   Loss 5.9282   LearningRate 0.0167   Epoch: 11   Global Step: 147030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:38:13,774-Speed 3072.16 samples/sec   Loss 6.0315   LearningRate 0.0167   Epoch: 11   Global Step: 147040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:38:17,193-Speed 2996.30 samples/sec   Loss 6.0055   LearningRate 0.0167   Epoch: 11   Global Step: 147050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:38:20,566-Speed 3036.49 samples/sec   Loss 6.0581   LearningRate 0.0166   Epoch: 11   Global Step: 147060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:38:23,942-Speed 3034.25 samples/sec   Loss 5.9784   LearningRate 0.0166   Epoch: 11   Global Step: 147070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:38:27,313-Speed 3038.50 samples/sec   Loss 6.0364   LearningRate 0.0166   Epoch: 11   Global Step: 147080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:38:30,729-Speed 2998.60 samples/sec   Loss 5.8301   LearningRate 0.0166   Epoch: 11   Global Step: 147090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:38:34,212-Speed 2942.19 samples/sec   Loss 5.9717   LearningRate 0.0166   Epoch: 11   Global Step: 147100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:38:37,609-Speed 3015.07 samples/sec   Loss 6.1123   LearningRate 0.0166   Epoch: 11   Global Step: 147110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:38:40,932-Speed 3082.41 samples/sec   Loss 5.8687   LearningRate 0.0166   Epoch: 11   Global Step: 147120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:38:44,335-Speed 3010.17 samples/sec   Loss 5.8952   LearningRate 0.0166   Epoch: 11   Global Step: 147130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:38:47,750-Speed 2999.20 samples/sec   Loss 6.0063   LearningRate 0.0166   Epoch: 11   Global Step: 147140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:38:51,102-Speed 3056.34 samples/sec   Loss 6.0700   LearningRate 0.0166   Epoch: 11   Global Step: 147150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:38:54,453-Speed 3055.95 samples/sec   Loss 5.9762   LearningRate 0.0166   Epoch: 11   Global Step: 147160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:38:57,830-Speed 3033.62 samples/sec   Loss 6.0371   LearningRate 0.0166   Epoch: 11   Global Step: 147170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:39:01,252-Speed 2994.26 samples/sec   Loss 5.8969   LearningRate 0.0166   Epoch: 11   Global Step: 147180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:39:04,699-Speed 2971.30 samples/sec   Loss 6.0244   LearningRate 0.0166   Epoch: 11   Global Step: 147190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:39:08,103-Speed 3009.04 samples/sec   Loss 5.9351   LearningRate 0.0166   Epoch: 11   Global Step: 147200   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:39:11,477-Speed 3035.64 samples/sec   Loss 6.0950   LearningRate 0.0166   Epoch: 11   Global Step: 147210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:39:14,914-Speed 2980.39 samples/sec   Loss 6.0239   LearningRate 0.0166   Epoch: 11   Global Step: 147220   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:39:18,287-Speed 3037.16 samples/sec   Loss 6.1138   LearningRate 0.0166   Epoch: 11   Global Step: 147230   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:39:21,661-Speed 3035.89 samples/sec   Loss 5.9639   LearningRate 0.0166   Epoch: 11   Global Step: 147240   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:39:25,058-Speed 3015.45 samples/sec   Loss 6.0172   LearningRate 0.0166   Epoch: 11   Global Step: 147250   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:39:28,450-Speed 3019.25 samples/sec   Loss 5.9515   LearningRate 0.0166   Epoch: 11   Global Step: 147260   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:39:31,853-Speed 3010.26 samples/sec   Loss 5.9737   LearningRate 0.0166   Epoch: 11   Global Step: 147270   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:39:35,222-Speed 3040.51 samples/sec   Loss 5.8854   LearningRate 0.0166   Epoch: 11   Global Step: 147280   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:39:38,590-Speed 3041.38 samples/sec   Loss 6.1404   LearningRate 0.0166   Epoch: 11   Global Step: 147290   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:39:42,041-Speed 2967.88 samples/sec   Loss 6.0275   LearningRate 0.0166   Epoch: 11   Global Step: 147300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:39:45,450-Speed 3004.56 samples/sec   Loss 5.8271   LearningRate 0.0166   Epoch: 11   Global Step: 147310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:39:48,905-Speed 2964.86 samples/sec   Loss 6.0194   LearningRate 0.0166   Epoch: 11   Global Step: 147320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:39:52,302-Speed 3014.60 samples/sec   Loss 5.9405   LearningRate 0.0166   Epoch: 11   Global Step: 147330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:39:55,655-Speed 3055.30 samples/sec   Loss 5.9187   LearningRate 0.0166   Epoch: 11   Global Step: 147340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:39:59,037-Speed 3028.19 samples/sec   Loss 6.0554   LearningRate 0.0166   Epoch: 11   Global Step: 147350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:02,457-Speed 2995.54 samples/sec   Loss 6.0617   LearningRate 0.0165   Epoch: 11   Global Step: 147360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:05,861-Speed 3008.48 samples/sec   Loss 6.0303   LearningRate 0.0165   Epoch: 11   Global Step: 147370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:09,302-Speed 2977.03 samples/sec   Loss 5.9736   LearningRate 0.0165   Epoch: 11   Global Step: 147380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:12,716-Speed 3000.06 samples/sec   Loss 5.9928   LearningRate 0.0165   Epoch: 11   Global Step: 147390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:16,087-Speed 3038.80 samples/sec   Loss 5.9107   LearningRate 0.0165   Epoch: 11   Global Step: 147400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:40:19,485-Speed 3014.31 samples/sec   Loss 5.9500   LearningRate 0.0165   Epoch: 11   Global Step: 147410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:22,859-Speed 3035.54 samples/sec   Loss 5.9110   LearningRate 0.0165   Epoch: 11   Global Step: 147420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:26,188-Speed 3077.17 samples/sec   Loss 6.1229   LearningRate 0.0165   Epoch: 11   Global Step: 147430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:29,593-Speed 3007.81 samples/sec   Loss 5.9490   LearningRate 0.0165   Epoch: 11   Global Step: 147440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:32,947-Speed 3054.61 samples/sec   Loss 5.9876   LearningRate 0.0165   Epoch: 11   Global Step: 147450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:36,356-Speed 3004.63 samples/sec   Loss 5.9496   LearningRate 0.0165   Epoch: 11   Global Step: 147460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:39,721-Speed 3043.96 samples/sec   Loss 5.9797   LearningRate 0.0165   Epoch: 11   Global Step: 147470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:43,084-Speed 3045.89 samples/sec   Loss 6.0057   LearningRate 0.0165   Epoch: 11   Global Step: 147480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:46,490-Speed 3006.88 samples/sec   Loss 5.9426   LearningRate 0.0165   Epoch: 11   Global Step: 147490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:49,979-Speed 2935.75 samples/sec   Loss 6.0489   LearningRate 0.0165   Epoch: 11   Global Step: 147500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:40:53,445-Speed 2955.61 samples/sec   Loss 6.0298   LearningRate 0.0165   Epoch: 11   Global Step: 147510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:40:56,868-Speed 2991.79 samples/sec   Loss 5.9177   LearningRate 0.0165   Epoch: 11   Global Step: 147520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:41:00,204-Speed 3071.05 samples/sec   Loss 6.0722   LearningRate 0.0165   Epoch: 11   Global Step: 147530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:41:03,572-Speed 3040.70 samples/sec   Loss 5.9957   LearningRate 0.0165   Epoch: 11   Global Step: 147540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:41:06,883-Speed 3093.66 samples/sec   Loss 5.9529   LearningRate 0.0165   Epoch: 11   Global Step: 147550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 15:41:10,265-Speed 3029.25 samples/sec   Loss 6.0191   LearningRate 0.0165   Epoch: 11   Global Step: 147560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:41:13,606-Speed 3065.84 samples/sec   Loss 5.8348   LearningRate 0.0165   Epoch: 11   Global Step: 147570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:41:17,016-Speed 3004.14 samples/sec   Loss 5.9947   LearningRate 0.0165   Epoch: 11   Global Step: 147580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:41:20,418-Speed 3010.86 samples/sec   Loss 5.9775   LearningRate 0.0165   Epoch: 11   Global Step: 147590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:41:23,761-Speed 3064.39 samples/sec   Loss 5.9893   LearningRate 0.0165   Epoch: 11   Global Step: 147600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:41:27,140-Speed 3031.37 samples/sec   Loss 5.8371   LearningRate 0.0165   Epoch: 11   Global Step: 147610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:41:30,518-Speed 3032.01 samples/sec   Loss 6.0167   LearningRate 0.0165   Epoch: 11   Global Step: 147620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:41:33,849-Speed 3074.69 samples/sec   Loss 6.0487   LearningRate 0.0165   Epoch: 11   Global Step: 147630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:41:37,211-Speed 3046.98 samples/sec   Loss 6.0646   LearningRate 0.0165   Epoch: 11   Global Step: 147640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 15:41:40,531-Speed 3085.58 samples/sec   Loss 5.8702   LearningRate 0.0165   Epoch: 11   Global Step: 147650   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:41:43,846-Speed 3089.65 samples/sec   Loss 6.0108   LearningRate 0.0165   Epoch: 11   Global Step: 147660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:41:47,222-Speed 3034.31 samples/sec   Loss 6.0017   LearningRate 0.0164   Epoch: 11   Global Step: 147670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:41:50,580-Speed 3050.48 samples/sec   Loss 6.0129   LearningRate 0.0164   Epoch: 11   Global Step: 147680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:41:53,983-Speed 3009.83 samples/sec   Loss 6.0265   LearningRate 0.0164   Epoch: 11   Global Step: 147690   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:41:57,349-Speed 3043.36 samples/sec   Loss 5.9183   LearningRate 0.0164   Epoch: 11   Global Step: 147700   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:42:00,720-Speed 3038.29 samples/sec   Loss 5.9512   LearningRate 0.0164   Epoch: 11   Global Step: 147710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 15:42:04,083-Speed 3045.37 samples/sec   Loss 5.9739   LearningRate 0.0164   Epoch: 11   Global Step: 147720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:42:07,450-Speed 3042.45 samples/sec   Loss 5.9786   LearningRate 0.0164   Epoch: 11   Global Step: 147730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:42:10,816-Speed 3043.77 samples/sec   Loss 5.9061   LearningRate 0.0164   Epoch: 11   Global Step: 147740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:42:14,184-Speed 3040.26 samples/sec   Loss 5.9040   LearningRate 0.0164   Epoch: 11   Global Step: 147750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:42:17,582-Speed 3014.61 samples/sec   Loss 5.9582   LearningRate 0.0164   Epoch: 11   Global Step: 147760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:42:20,906-Speed 3082.01 samples/sec   Loss 6.0079   LearningRate 0.0164   Epoch: 11   Global Step: 147770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:42:24,245-Speed 3068.20 samples/sec   Loss 5.8886   LearningRate 0.0164   Epoch: 11   Global Step: 147780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:42:27,589-Speed 3062.41 samples/sec   Loss 5.9567   LearningRate 0.0164   Epoch: 11   Global Step: 147790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:42:31,036-Speed 2971.60 samples/sec   Loss 5.9379   LearningRate 0.0164   Epoch: 11   Global Step: 147800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:42:34,455-Speed 2995.88 samples/sec   Loss 5.9758   LearningRate 0.0164   Epoch: 11   Global Step: 147810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:42:37,781-Speed 3079.71 samples/sec   Loss 5.9527   LearningRate 0.0164   Epoch: 11   Global Step: 147820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:42:41,088-Speed 3097.71 samples/sec   Loss 5.9518   LearningRate 0.0164   Epoch: 11   Global Step: 147830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:42:44,456-Speed 3041.02 samples/sec   Loss 6.0153   LearningRate 0.0164   Epoch: 11   Global Step: 147840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:42:47,808-Speed 3055.29 samples/sec   Loss 5.8276   LearningRate 0.0164   Epoch: 11   Global Step: 147850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:42:51,238-Speed 2986.41 samples/sec   Loss 5.8825   LearningRate 0.0164   Epoch: 11   Global Step: 147860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:42:54,601-Speed 3046.57 samples/sec   Loss 5.8984   LearningRate 0.0164   Epoch: 11   Global Step: 147870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:42:57,978-Speed 3032.60 samples/sec   Loss 5.9929   LearningRate 0.0164   Epoch: 11   Global Step: 147880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:43:01,353-Speed 3034.80 samples/sec   Loss 5.9009   LearningRate 0.0164   Epoch: 11   Global Step: 147890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:43:04,733-Speed 3030.68 samples/sec   Loss 5.9133   LearningRate 0.0164   Epoch: 11   Global Step: 147900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:43:08,096-Speed 3045.16 samples/sec   Loss 5.9833   LearningRate 0.0164   Epoch: 11   Global Step: 147910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:43:11,422-Speed 3080.49 samples/sec   Loss 5.9513   LearningRate 0.0164   Epoch: 11   Global Step: 147920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:43:14,907-Speed 2938.94 samples/sec   Loss 6.0207   LearningRate 0.0164   Epoch: 11   Global Step: 147930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:43:18,288-Speed 3029.27 samples/sec   Loss 5.8782   LearningRate 0.0164   Epoch: 11   Global Step: 147940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:43:21,672-Speed 3027.58 samples/sec   Loss 5.8545   LearningRate 0.0164   Epoch: 11   Global Step: 147950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:43:25,118-Speed 2972.02 samples/sec   Loss 5.8800   LearningRate 0.0164   Epoch: 11   Global Step: 147960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:43:28,468-Speed 3058.29 samples/sec   Loss 5.9964   LearningRate 0.0164   Epoch: 11   Global Step: 147970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:43:31,897-Speed 2987.26 samples/sec   Loss 5.9414   LearningRate 0.0163   Epoch: 11   Global Step: 147980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:43:35,292-Speed 3017.13 samples/sec   Loss 5.9205   LearningRate 0.0163   Epoch: 11   Global Step: 147990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:43:38,679-Speed 3023.50 samples/sec   Loss 5.9853   LearningRate 0.0163   Epoch: 11   Global Step: 148000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:43:42,107-Speed 2988.34 samples/sec   Loss 5.8930   LearningRate 0.0163   Epoch: 11   Global Step: 148010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:43:45,403-Speed 3107.76 samples/sec   Loss 5.9210   LearningRate 0.0163   Epoch: 11   Global Step: 148020   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:43:48,715-Speed 3093.06 samples/sec   Loss 6.0108   LearningRate 0.0163   Epoch: 11   Global Step: 148030   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:43:52,046-Speed 3075.56 samples/sec   Loss 6.0062   LearningRate 0.0163   Epoch: 11   Global Step: 148040   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:43:55,481-Speed 2980.94 samples/sec   Loss 5.9555   LearningRate 0.0163   Epoch: 11   Global Step: 148050   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:43:58,888-Speed 3006.92 samples/sec   Loss 5.9994   LearningRate 0.0163   Epoch: 11   Global Step: 148060   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:02,241-Speed 3054.64 samples/sec   Loss 5.9854   LearningRate 0.0163   Epoch: 11   Global Step: 148070   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:05,594-Speed 3054.51 samples/sec   Loss 5.8505   LearningRate 0.0163   Epoch: 11   Global Step: 148080   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:09,010-Speed 2999.26 samples/sec   Loss 5.8285   LearningRate 0.0163   Epoch: 11   Global Step: 148090   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:12,436-Speed 2989.19 samples/sec   Loss 5.9696   LearningRate 0.0163   Epoch: 11   Global Step: 148100   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:15,766-Speed 3076.42 samples/sec   Loss 5.9337   LearningRate 0.0163   Epoch: 11   Global Step: 148110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:19,156-Speed 3021.37 samples/sec   Loss 5.9241   LearningRate 0.0163   Epoch: 11   Global Step: 148120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:44:22,590-Speed 2982.87 samples/sec   Loss 6.0097   LearningRate 0.0163   Epoch: 11   Global Step: 148130   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:25,910-Speed 3084.66 samples/sec   Loss 5.9375   LearningRate 0.0163   Epoch: 11   Global Step: 148140   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:29,277-Speed 3042.86 samples/sec   Loss 5.8841   LearningRate 0.0163   Epoch: 11   Global Step: 148150   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:32,675-Speed 3013.77 samples/sec   Loss 6.0010   LearningRate 0.0163   Epoch: 11   Global Step: 148160   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:36,024-Speed 3058.61 samples/sec   Loss 6.0846   LearningRate 0.0163   Epoch: 11   Global Step: 148170   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:39,413-Speed 3022.34 samples/sec   Loss 6.0454   LearningRate 0.0163   Epoch: 11   Global Step: 148180   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:42,788-Speed 3034.54 samples/sec   Loss 5.8591   LearningRate 0.0163   Epoch: 11   Global Step: 148190   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:46,158-Speed 3039.47 samples/sec   Loss 5.9777   LearningRate 0.0163   Epoch: 11   Global Step: 148200   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:49,573-Speed 2999.96 samples/sec   Loss 5.9744   LearningRate 0.0163   Epoch: 11   Global Step: 148210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:52,940-Speed 3041.74 samples/sec   Loss 5.9034   LearningRate 0.0163   Epoch: 11   Global Step: 148220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:44:56,343-Speed 3010.09 samples/sec   Loss 5.9723   LearningRate 0.0163   Epoch: 11   Global Step: 148230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:44:59,680-Speed 3068.70 samples/sec   Loss 5.9124   LearningRate 0.0163   Epoch: 11   Global Step: 148240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:03,102-Speed 2993.67 samples/sec   Loss 5.9432   LearningRate 0.0163   Epoch: 11   Global Step: 148250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:06,547-Speed 2973.42 samples/sec   Loss 5.9370   LearningRate 0.0163   Epoch: 11   Global Step: 148260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:09,923-Speed 3034.10 samples/sec   Loss 5.8015   LearningRate 0.0163   Epoch: 11   Global Step: 148270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:13,351-Speed 2987.57 samples/sec   Loss 5.8036   LearningRate 0.0162   Epoch: 11   Global Step: 148280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:16,745-Speed 3018.15 samples/sec   Loss 5.9921   LearningRate 0.0162   Epoch: 11   Global Step: 148290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:20,148-Speed 3009.65 samples/sec   Loss 5.9536   LearningRate 0.0162   Epoch: 11   Global Step: 148300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:23,595-Speed 2971.86 samples/sec   Loss 5.8668   LearningRate 0.0162   Epoch: 11   Global Step: 148310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:26,924-Speed 3077.17 samples/sec   Loss 5.9138   LearningRate 0.0162   Epoch: 11   Global Step: 148320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:30,269-Speed 3061.82 samples/sec   Loss 5.9273   LearningRate 0.0162   Epoch: 11   Global Step: 148330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:33,640-Speed 3038.60 samples/sec   Loss 5.9064   LearningRate 0.0162   Epoch: 11   Global Step: 148340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:37,008-Speed 3041.47 samples/sec   Loss 5.9250   LearningRate 0.0162   Epoch: 11   Global Step: 148350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:40,398-Speed 3021.51 samples/sec   Loss 5.9949   LearningRate 0.0162   Epoch: 11   Global Step: 148360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:43,776-Speed 3032.78 samples/sec   Loss 5.9529   LearningRate 0.0162   Epoch: 11   Global Step: 148370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:47,121-Speed 3061.68 samples/sec   Loss 5.8677   LearningRate 0.0162   Epoch: 11   Global Step: 148380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:50,503-Speed 3029.14 samples/sec   Loss 5.8853   LearningRate 0.0162   Epoch: 11   Global Step: 148390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:53,873-Speed 3039.05 samples/sec   Loss 5.8766   LearningRate 0.0162   Epoch: 11   Global Step: 148400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:45:57,236-Speed 3046.11 samples/sec   Loss 5.9172   LearningRate 0.0162   Epoch: 11   Global Step: 148410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:46:00,579-Speed 3063.68 samples/sec   Loss 5.8808   LearningRate 0.0162   Epoch: 11   Global Step: 148420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:46:03,980-Speed 3011.78 samples/sec   Loss 5.8846   LearningRate 0.0162   Epoch: 11   Global Step: 148430   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:46:07,397-Speed 2997.67 samples/sec   Loss 5.9052   LearningRate 0.0162   Epoch: 11   Global Step: 148440   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:46:10,789-Speed 3019.73 samples/sec   Loss 5.9474   LearningRate 0.0162   Epoch: 11   Global Step: 148450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:46:14,235-Speed 2972.35 samples/sec   Loss 5.9047   LearningRate 0.0162   Epoch: 11   Global Step: 148460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:46:17,570-Speed 3071.95 samples/sec   Loss 5.9735   LearningRate 0.0162   Epoch: 11   Global Step: 148470   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:46:20,990-Speed 2994.81 samples/sec   Loss 5.8544   LearningRate 0.0162   Epoch: 11   Global Step: 148480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:46:24,342-Speed 3055.32 samples/sec   Loss 5.9786   LearningRate 0.0162   Epoch: 11   Global Step: 148490   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:46:27,840-Speed 2928.60 samples/sec   Loss 5.8669   LearningRate 0.0162   Epoch: 11   Global Step: 148500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:46:31,208-Speed 3041.41 samples/sec   Loss 5.9149   LearningRate 0.0162   Epoch: 11   Global Step: 148510   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:46:34,545-Speed 3069.50 samples/sec   Loss 5.9739   LearningRate 0.0162   Epoch: 11   Global Step: 148520   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:46:37,932-Speed 3023.68 samples/sec   Loss 5.9385   LearningRate 0.0162   Epoch: 11   Global Step: 148530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:46:41,306-Speed 3036.62 samples/sec   Loss 5.8556   LearningRate 0.0162   Epoch: 11   Global Step: 148540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:46:44,677-Speed 3038.09 samples/sec   Loss 5.9852   LearningRate 0.0162   Epoch: 11   Global Step: 148550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:46:48,079-Speed 3010.56 samples/sec   Loss 5.9394   LearningRate 0.0162   Epoch: 11   Global Step: 148560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:46:51,505-Speed 2992.48 samples/sec   Loss 5.9167   LearningRate 0.0162   Epoch: 11   Global Step: 148570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:46:54,917-Speed 3002.41 samples/sec   Loss 6.0273   LearningRate 0.0162   Epoch: 11   Global Step: 148580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:46:58,255-Speed 3068.33 samples/sec   Loss 5.9417   LearningRate 0.0161   Epoch: 11   Global Step: 148590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:47:01,634-Speed 3031.03 samples/sec   Loss 5.8615   LearningRate 0.0161   Epoch: 11   Global Step: 148600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:47:05,019-Speed 3028.20 samples/sec   Loss 5.9728   LearningRate 0.0161   Epoch: 11   Global Step: 148610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:47:08,403-Speed 3026.45 samples/sec   Loss 5.9185   LearningRate 0.0161   Epoch: 11   Global Step: 148620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:47:11,799-Speed 3016.11 samples/sec   Loss 5.8830   LearningRate 0.0161   Epoch: 11   Global Step: 148630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:47:15,139-Speed 3066.78 samples/sec   Loss 5.9979   LearningRate 0.0161   Epoch: 11   Global Step: 148640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:47:18,489-Speed 3058.14 samples/sec   Loss 5.8261   LearningRate 0.0161   Epoch: 11   Global Step: 148650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:47:21,847-Speed 3050.34 samples/sec   Loss 5.8678   LearningRate 0.0161   Epoch: 11   Global Step: 148660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:47:25,259-Speed 3002.13 samples/sec   Loss 5.9145   LearningRate 0.0161   Epoch: 11   Global Step: 148670   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:47:28,619-Speed 3048.42 samples/sec   Loss 5.9573   LearningRate 0.0161   Epoch: 11   Global Step: 148680   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:47:32,021-Speed 3010.47 samples/sec   Loss 5.9043   LearningRate 0.0161   Epoch: 11   Global Step: 148690   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:47:35,428-Speed 3006.65 samples/sec   Loss 5.7796   LearningRate 0.0161   Epoch: 11   Global Step: 148700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:47:38,749-Speed 3084.08 samples/sec   Loss 5.9569   LearningRate 0.0161   Epoch: 11   Global Step: 148710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:47:42,051-Speed 3102.76 samples/sec   Loss 5.7885   LearningRate 0.0161   Epoch: 11   Global Step: 148720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:47:45,386-Speed 3071.51 samples/sec   Loss 6.0149   LearningRate 0.0161   Epoch: 11   Global Step: 148730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:47:48,761-Speed 3034.55 samples/sec   Loss 5.9636   LearningRate 0.0161   Epoch: 11   Global Step: 148740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:47:52,135-Speed 3035.95 samples/sec   Loss 5.7972   LearningRate 0.0161   Epoch: 11   Global Step: 148750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:47:55,495-Speed 3049.29 samples/sec   Loss 5.8198   LearningRate 0.0161   Epoch: 11   Global Step: 148760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:47:58,871-Speed 3033.65 samples/sec   Loss 5.8555   LearningRate 0.0161   Epoch: 11   Global Step: 148770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:48:02,268-Speed 3015.28 samples/sec   Loss 5.9170   LearningRate 0.0161   Epoch: 11   Global Step: 148780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:48:05,604-Speed 3070.32 samples/sec   Loss 5.9595   LearningRate 0.0161   Epoch: 11   Global Step: 148790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:48:09,019-Speed 2999.30 samples/sec   Loss 5.8776   LearningRate 0.0161   Epoch: 11   Global Step: 148800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:48:12,435-Speed 2998.28 samples/sec   Loss 5.9412   LearningRate 0.0161   Epoch: 11   Global Step: 148810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:48:15,779-Speed 3063.42 samples/sec   Loss 5.9709   LearningRate 0.0161   Epoch: 11   Global Step: 148820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:48:19,224-Speed 2973.01 samples/sec   Loss 5.9152   LearningRate 0.0161   Epoch: 11   Global Step: 148830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:48:22,541-Speed 3088.39 samples/sec   Loss 5.8332   LearningRate 0.0161   Epoch: 11   Global Step: 148840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:48:25,865-Speed 3081.70 samples/sec   Loss 5.9688   LearningRate 0.0161   Epoch: 11   Global Step: 148850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:48:29,268-Speed 3010.32 samples/sec   Loss 5.9060   LearningRate 0.0161   Epoch: 11   Global Step: 148860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:48:32,692-Speed 2991.04 samples/sec   Loss 5.8510   LearningRate 0.0161   Epoch: 11   Global Step: 148870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:48:36,121-Speed 2987.79 samples/sec   Loss 5.9106   LearningRate 0.0161   Epoch: 11   Global Step: 148880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:48:39,481-Speed 3048.56 samples/sec   Loss 5.9094   LearningRate 0.0161   Epoch: 11   Global Step: 148890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:48:42,905-Speed 2991.56 samples/sec   Loss 5.9345   LearningRate 0.0160   Epoch: 11   Global Step: 148900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:48:46,261-Speed 3052.01 samples/sec   Loss 5.8285   LearningRate 0.0160   Epoch: 11   Global Step: 148910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:48:49,648-Speed 3024.36 samples/sec   Loss 5.8886   LearningRate 0.0160   Epoch: 11   Global Step: 148920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:48:53,028-Speed 3030.03 samples/sec   Loss 5.7314   LearningRate 0.0160   Epoch: 11   Global Step: 148930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:48:56,429-Speed 3013.11 samples/sec   Loss 5.9387   LearningRate 0.0160   Epoch: 11   Global Step: 148940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:48:59,830-Speed 3012.30 samples/sec   Loss 6.0153   LearningRate 0.0160   Epoch: 11   Global Step: 148950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:49:03,224-Speed 3018.48 samples/sec   Loss 5.9060   LearningRate 0.0160   Epoch: 11   Global Step: 148960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:49:06,617-Speed 3018.65 samples/sec   Loss 5.7721   LearningRate 0.0160   Epoch: 11   Global Step: 148970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:49:09,959-Speed 3064.68 samples/sec   Loss 5.9487   LearningRate 0.0160   Epoch: 11   Global Step: 148980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:49:13,346-Speed 3024.37 samples/sec   Loss 5.9504   LearningRate 0.0160   Epoch: 11   Global Step: 148990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:49:16,743-Speed 3015.12 samples/sec   Loss 6.0002   LearningRate 0.0160   Epoch: 11   Global Step: 149000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:49:20,173-Speed 2986.29 samples/sec   Loss 5.9317   LearningRate 0.0160   Epoch: 11   Global Step: 149010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:49:23,540-Speed 3042.17 samples/sec   Loss 5.9357   LearningRate 0.0160   Epoch: 11   Global Step: 149020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:49:26,885-Speed 3061.92 samples/sec   Loss 5.9923   LearningRate 0.0160   Epoch: 11   Global Step: 149030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:49:30,206-Speed 3084.93 samples/sec   Loss 5.8323   LearningRate 0.0160   Epoch: 11   Global Step: 149040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:49:34,004-Speed 2696.34 samples/sec   Loss 5.8347   LearningRate 0.0160   Epoch: 11   Global Step: 149050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:06,256-Speed 317.52 samples/sec   Loss 4.7019   LearningRate 0.0160   Epoch: 12   Global Step: 149060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:09,808-Speed 2883.90 samples/sec   Loss 4.4558   LearningRate 0.0160   Epoch: 12   Global Step: 149070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:13,169-Speed 3047.49 samples/sec   Loss 4.4148   LearningRate 0.0160   Epoch: 12   Global Step: 149080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:16,837-Speed 2792.21 samples/sec   Loss 4.3830   LearningRate 0.0160   Epoch: 12   Global Step: 149090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:20,184-Speed 3060.33 samples/sec   Loss 4.3784   LearningRate 0.0160   Epoch: 12   Global Step: 149100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:23,539-Speed 3053.24 samples/sec   Loss 4.4723   LearningRate 0.0160   Epoch: 12   Global Step: 149110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:26,927-Speed 3023.35 samples/sec   Loss 4.5124   LearningRate 0.0160   Epoch: 12   Global Step: 149120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:30,360-Speed 2984.34 samples/sec   Loss 4.4501   LearningRate 0.0160   Epoch: 12   Global Step: 149130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:33,720-Speed 3048.88 samples/sec   Loss 4.5939   LearningRate 0.0160   Epoch: 12   Global Step: 149140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:50:37,116-Speed 3015.92 samples/sec   Loss 4.4897   LearningRate 0.0160   Epoch: 12   Global Step: 149150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:40,586-Speed 2952.13 samples/sec   Loss 4.5148   LearningRate 0.0160   Epoch: 12   Global Step: 149160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:43,968-Speed 3028.87 samples/sec   Loss 4.4455   LearningRate 0.0160   Epoch: 12   Global Step: 149170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:47,321-Speed 3054.88 samples/sec   Loss 4.4451   LearningRate 0.0160   Epoch: 12   Global Step: 149180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:50,626-Speed 3098.54 samples/sec   Loss 4.4437   LearningRate 0.0160   Epoch: 12   Global Step: 149190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:54,025-Speed 3013.98 samples/sec   Loss 4.4353   LearningRate 0.0160   Epoch: 12   Global Step: 149200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:50:57,470-Speed 2973.49 samples/sec   Loss 4.5410   LearningRate 0.0159   Epoch: 12   Global Step: 149210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:51:00,794-Speed 3081.23 samples/sec   Loss 4.5070   LearningRate 0.0159   Epoch: 12   Global Step: 149220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:51:04,256-Speed 2958.91 samples/sec   Loss 4.4963   LearningRate 0.0159   Epoch: 12   Global Step: 149230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:51:07,668-Speed 3001.87 samples/sec   Loss 4.4119   LearningRate 0.0159   Epoch: 12   Global Step: 149240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:51:11,009-Speed 3066.37 samples/sec   Loss 4.5738   LearningRate 0.0159   Epoch: 12   Global Step: 149250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:51:14,432-Speed 2991.83 samples/sec   Loss 4.5093   LearningRate 0.0159   Epoch: 12   Global Step: 149260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:51:17,838-Speed 3007.72 samples/sec   Loss 4.4970   LearningRate 0.0159   Epoch: 12   Global Step: 149270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:51:21,196-Speed 3050.71 samples/sec   Loss 4.5243   LearningRate 0.0159   Epoch: 12   Global Step: 149280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:51:24,546-Speed 3057.27 samples/sec   Loss 4.5702   LearningRate 0.0159   Epoch: 12   Global Step: 149290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:51:27,895-Speed 3058.54 samples/sec   Loss 4.4764   LearningRate 0.0159   Epoch: 12   Global Step: 149300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:51:31,345-Speed 2968.90 samples/sec   Loss 4.4836   LearningRate 0.0159   Epoch: 12   Global Step: 149310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:51:34,697-Speed 3056.09 samples/sec   Loss 4.5434   LearningRate 0.0159   Epoch: 12   Global Step: 149320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:51:38,108-Speed 3004.07 samples/sec   Loss 4.4532   LearningRate 0.0159   Epoch: 12   Global Step: 149330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:51:41,564-Speed 2963.34 samples/sec   Loss 4.4951   LearningRate 0.0159   Epoch: 12   Global Step: 149340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:51:44,956-Speed 3020.38 samples/sec   Loss 4.4742   LearningRate 0.0159   Epoch: 12   Global Step: 149350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:51:48,549-Speed 2850.69 samples/sec   Loss 4.6328   LearningRate 0.0159   Epoch: 12   Global Step: 149360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:51:52,028-Speed 2943.82 samples/sec   Loss 4.5832   LearningRate 0.0159   Epoch: 12   Global Step: 149370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:51:55,524-Speed 2930.74 samples/sec   Loss 4.5138   LearningRate 0.0159   Epoch: 12   Global Step: 149380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:51:58,922-Speed 3014.21 samples/sec   Loss 4.5411   LearningRate 0.0159   Epoch: 12   Global Step: 149390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:52:02,252-Speed 3076.66 samples/sec   Loss 4.6050   LearningRate 0.0159   Epoch: 12   Global Step: 149400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:52:05,661-Speed 3004.32 samples/sec   Loss 4.5707   LearningRate 0.0159   Epoch: 12   Global Step: 149410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:52:09,042-Speed 3030.23 samples/sec   Loss 4.6063   LearningRate 0.0159   Epoch: 12   Global Step: 149420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:52:12,380-Speed 3068.94 samples/sec   Loss 4.7001   LearningRate 0.0159   Epoch: 12   Global Step: 149430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:52:15,772-Speed 3019.36 samples/sec   Loss 4.5437   LearningRate 0.0159   Epoch: 12   Global Step: 149440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:52:19,145-Speed 3036.35 samples/sec   Loss 4.5551   LearningRate 0.0159   Epoch: 12   Global Step: 149450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:52:22,487-Speed 3065.15 samples/sec   Loss 4.6097   LearningRate 0.0159   Epoch: 12   Global Step: 149460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:52:25,852-Speed 3044.09 samples/sec   Loss 4.5867   LearningRate 0.0159   Epoch: 12   Global Step: 149470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:52:29,185-Speed 3072.93 samples/sec   Loss 4.4024   LearningRate 0.0159   Epoch: 12   Global Step: 149480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:52:32,581-Speed 3016.71 samples/sec   Loss 4.6082   LearningRate 0.0159   Epoch: 12   Global Step: 149490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:52:35,946-Speed 3043.92 samples/sec   Loss 4.4849   LearningRate 0.0159   Epoch: 12   Global Step: 149500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:52:39,382-Speed 2980.96 samples/sec   Loss 4.5454   LearningRate 0.0159   Epoch: 12   Global Step: 149510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:52:42,765-Speed 3028.04 samples/sec   Loss 4.5812   LearningRate 0.0158   Epoch: 12   Global Step: 149520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:52:46,157-Speed 3019.91 samples/sec   Loss 4.5702   LearningRate 0.0158   Epoch: 12   Global Step: 149530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:52:49,577-Speed 2994.53 samples/sec   Loss 4.6163   LearningRate 0.0158   Epoch: 12   Global Step: 149540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:52:52,944-Speed 3042.48 samples/sec   Loss 4.6672   LearningRate 0.0158   Epoch: 12   Global Step: 149550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:52:56,367-Speed 2992.00 samples/sec   Loss 4.6022   LearningRate 0.0158   Epoch: 12   Global Step: 149560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:52:59,771-Speed 3010.27 samples/sec   Loss 4.6159   LearningRate 0.0158   Epoch: 12   Global Step: 149570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:53:03,126-Speed 3053.39 samples/sec   Loss 4.6317   LearningRate 0.0158   Epoch: 12   Global Step: 149580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:53:06,490-Speed 3044.89 samples/sec   Loss 4.4729   LearningRate 0.0158   Epoch: 12   Global Step: 149590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:53:09,988-Speed 2927.61 samples/sec   Loss 4.6701   LearningRate 0.0158   Epoch: 12   Global Step: 149600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:53:13,403-Speed 3000.03 samples/sec   Loss 4.6367   LearningRate 0.0158   Epoch: 12   Global Step: 149610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:53:16,750-Speed 3060.13 samples/sec   Loss 4.6993   LearningRate 0.0158   Epoch: 12   Global Step: 149620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:53:20,097-Speed 3060.65 samples/sec   Loss 4.5614   LearningRate 0.0158   Epoch: 12   Global Step: 149630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:53:23,503-Speed 3006.67 samples/sec   Loss 4.5773   LearningRate 0.0158   Epoch: 12   Global Step: 149640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:53:26,832-Speed 3076.83 samples/sec   Loss 4.6714   LearningRate 0.0158   Epoch: 12   Global Step: 149650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:53:30,177-Speed 3062.38 samples/sec   Loss 4.6142   LearningRate 0.0158   Epoch: 12   Global Step: 149660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:53:33,630-Speed 2966.91 samples/sec   Loss 4.5361   LearningRate 0.0158   Epoch: 12   Global Step: 149670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:53:36,970-Speed 3066.77 samples/sec   Loss 4.5845   LearningRate 0.0158   Epoch: 12   Global Step: 149680   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:53:40,424-Speed 2965.16 samples/sec   Loss 4.6559   LearningRate 0.0158   Epoch: 12   Global Step: 149690   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:53:43,878-Speed 2965.81 samples/sec   Loss 4.6239   LearningRate 0.0158   Epoch: 12   Global Step: 149700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:53:47,321-Speed 2974.42 samples/sec   Loss 4.5937   LearningRate 0.0158   Epoch: 12   Global Step: 149710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:53:50,779-Speed 2961.99 samples/sec   Loss 4.7000   LearningRate 0.0158   Epoch: 12   Global Step: 149720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:53:54,228-Speed 2969.59 samples/sec   Loss 4.6464   LearningRate 0.0158   Epoch: 12   Global Step: 149730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:53:57,542-Speed 3090.87 samples/sec   Loss 4.7655   LearningRate 0.0158   Epoch: 12   Global Step: 149740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:54:01,000-Speed 2962.59 samples/sec   Loss 4.4654   LearningRate 0.0158   Epoch: 12   Global Step: 149750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:54:04,323-Speed 3081.80 samples/sec   Loss 4.6290   LearningRate 0.0158   Epoch: 12   Global Step: 149760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:54:07,690-Speed 3043.35 samples/sec   Loss 4.7803   LearningRate 0.0158   Epoch: 12   Global Step: 149770   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:54:11,067-Speed 3032.47 samples/sec   Loss 4.7963   LearningRate 0.0158   Epoch: 12   Global Step: 149780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:54:14,499-Speed 2985.63 samples/sec   Loss 4.7309   LearningRate 0.0158   Epoch: 12   Global Step: 149790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:54:17,835-Speed 3070.37 samples/sec   Loss 4.5757   LearningRate 0.0158   Epoch: 12   Global Step: 149800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:54:21,251-Speed 2998.94 samples/sec   Loss 4.6835   LearningRate 0.0158   Epoch: 12   Global Step: 149810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:54:24,622-Speed 3038.00 samples/sec   Loss 4.7095   LearningRate 0.0158   Epoch: 12   Global Step: 149820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:54:28,097-Speed 2947.68 samples/sec   Loss 4.6472   LearningRate 0.0158   Epoch: 12   Global Step: 149830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:54:31,451-Speed 3054.14 samples/sec   Loss 4.6187   LearningRate 0.0157   Epoch: 12   Global Step: 149840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:54:34,803-Speed 3055.20 samples/sec   Loss 4.7087   LearningRate 0.0157   Epoch: 12   Global Step: 149850   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:54:38,242-Speed 2978.94 samples/sec   Loss 4.6528   LearningRate 0.0157   Epoch: 12   Global Step: 149860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:54:41,598-Speed 3052.08 samples/sec   Loss 4.7455   LearningRate 0.0157   Epoch: 12   Global Step: 149870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:54:44,987-Speed 3022.33 samples/sec   Loss 4.6431   LearningRate 0.0157   Epoch: 12   Global Step: 149880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:54:48,430-Speed 2975.05 samples/sec   Loss 4.5934   LearningRate 0.0157   Epoch: 12   Global Step: 149890   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:54:51,840-Speed 3003.18 samples/sec   Loss 4.6976   LearningRate 0.0157   Epoch: 12   Global Step: 149900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:54:55,244-Speed 3009.41 samples/sec   Loss 4.7669   LearningRate 0.0157   Epoch: 12   Global Step: 149910   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:54:58,601-Speed 3051.73 samples/sec   Loss 4.7286   LearningRate 0.0157   Epoch: 12   Global Step: 149920   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:55:01,944-Speed 3063.64 samples/sec   Loss 4.6321   LearningRate 0.0157   Epoch: 12   Global Step: 149930   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:55:05,351-Speed 3006.46 samples/sec   Loss 4.7727   LearningRate 0.0157   Epoch: 12   Global Step: 149940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:55:08,665-Speed 3091.42 samples/sec   Loss 4.7538   LearningRate 0.0157   Epoch: 12   Global Step: 149950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:55:12,134-Speed 2952.07 samples/sec   Loss 4.6780   LearningRate 0.0157   Epoch: 12   Global Step: 149960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:55:15,603-Speed 2953.68 samples/sec   Loss 4.8165   LearningRate 0.0157   Epoch: 12   Global Step: 149970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:55:18,994-Speed 3020.85 samples/sec   Loss 4.7868   LearningRate 0.0157   Epoch: 12   Global Step: 149980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:55:22,376-Speed 3028.61 samples/sec   Loss 4.7165   LearningRate 0.0157   Epoch: 12   Global Step: 149990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:55:25,867-Speed 2933.93 samples/sec   Loss 4.6538   LearningRate 0.0157   Epoch: 12   Global Step: 150000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:55:29,252-Speed 3025.92 samples/sec   Loss 4.6442   LearningRate 0.0157   Epoch: 12   Global Step: 150010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:55:32,656-Speed 3008.98 samples/sec   Loss 4.8020   LearningRate 0.0157   Epoch: 12   Global Step: 150020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:55:36,091-Speed 2981.91 samples/sec   Loss 4.7782   LearningRate 0.0157   Epoch: 12   Global Step: 150030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:55:39,498-Speed 3006.04 samples/sec   Loss 4.7227   LearningRate 0.0157   Epoch: 12   Global Step: 150040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:55:42,863-Speed 3044.26 samples/sec   Loss 4.7491   LearningRate 0.0157   Epoch: 12   Global Step: 150050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:55:46,159-Speed 3107.25 samples/sec   Loss 4.7149   LearningRate 0.0157   Epoch: 12   Global Step: 150060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:55:49,468-Speed 3096.05 samples/sec   Loss 4.8018   LearningRate 0.0157   Epoch: 12   Global Step: 150070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:55:52,901-Speed 2983.73 samples/sec   Loss 4.7722   LearningRate 0.0157   Epoch: 12   Global Step: 150080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:55:56,275-Speed 3035.73 samples/sec   Loss 4.6844   LearningRate 0.0157   Epoch: 12   Global Step: 150090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:55:59,703-Speed 2987.71 samples/sec   Loss 4.6953   LearningRate 0.0157   Epoch: 12   Global Step: 150100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:56:03,079-Speed 3034.06 samples/sec   Loss 4.7106   LearningRate 0.0157   Epoch: 12   Global Step: 150110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:56:06,423-Speed 3062.80 samples/sec   Loss 4.7780   LearningRate 0.0157   Epoch: 12   Global Step: 150120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:56:09,797-Speed 3036.70 samples/sec   Loss 4.7328   LearningRate 0.0157   Epoch: 12   Global Step: 150130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:56:13,167-Speed 3039.52 samples/sec   Loss 4.7776   LearningRate 0.0157   Epoch: 12   Global Step: 150140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:56:16,536-Speed 3040.21 samples/sec   Loss 4.7457   LearningRate 0.0156   Epoch: 12   Global Step: 150150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:56:19,930-Speed 3018.12 samples/sec   Loss 4.7465   LearningRate 0.0156   Epoch: 12   Global Step: 150160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:56:23,268-Speed 3067.92 samples/sec   Loss 4.7647   LearningRate 0.0156   Epoch: 12   Global Step: 150170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:56:26,715-Speed 2971.77 samples/sec   Loss 4.8039   LearningRate 0.0156   Epoch: 12   Global Step: 150180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:56:30,075-Speed 3048.89 samples/sec   Loss 4.7088   LearningRate 0.0156   Epoch: 12   Global Step: 150190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:56:33,436-Speed 3047.81 samples/sec   Loss 4.8652   LearningRate 0.0156   Epoch: 12   Global Step: 150200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:56:36,814-Speed 3031.81 samples/sec   Loss 4.8823   LearningRate 0.0156   Epoch: 12   Global Step: 150210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:56:40,150-Speed 3070.86 samples/sec   Loss 4.7221   LearningRate 0.0156   Epoch: 12   Global Step: 150220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:56:43,481-Speed 3074.67 samples/sec   Loss 4.7907   LearningRate 0.0156   Epoch: 12   Global Step: 150230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:56:46,851-Speed 3039.68 samples/sec   Loss 4.7384   LearningRate 0.0156   Epoch: 12   Global Step: 150240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:56:50,169-Speed 3087.16 samples/sec   Loss 4.7681   LearningRate 0.0156   Epoch: 12   Global Step: 150250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:56:53,522-Speed 3054.47 samples/sec   Loss 4.7503   LearningRate 0.0156   Epoch: 12   Global Step: 150260   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:56:56,963-Speed 2977.24 samples/sec   Loss 4.8424   LearningRate 0.0156   Epoch: 12   Global Step: 150270   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:00,319-Speed 3051.84 samples/sec   Loss 4.7021   LearningRate 0.0156   Epoch: 12   Global Step: 150280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:03,691-Speed 3037.23 samples/sec   Loss 4.7804   LearningRate 0.0156   Epoch: 12   Global Step: 150290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:07,121-Speed 2986.36 samples/sec   Loss 4.8073   LearningRate 0.0156   Epoch: 12   Global Step: 150300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:10,458-Speed 3069.28 samples/sec   Loss 4.7635   LearningRate 0.0156   Epoch: 12   Global Step: 150310   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:13,788-Speed 3076.17 samples/sec   Loss 4.8124   LearningRate 0.0156   Epoch: 12   Global Step: 150320   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:17,221-Speed 2983.93 samples/sec   Loss 4.8581   LearningRate 0.0156   Epoch: 12   Global Step: 150330   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:20,635-Speed 3000.08 samples/sec   Loss 4.8280   LearningRate 0.0156   Epoch: 12   Global Step: 150340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:24,063-Speed 2988.38 samples/sec   Loss 4.7557   LearningRate 0.0156   Epoch: 12   Global Step: 150350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:27,400-Speed 3068.92 samples/sec   Loss 4.8795   LearningRate 0.0156   Epoch: 12   Global Step: 150360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:57:30,720-Speed 3085.71 samples/sec   Loss 4.6944   LearningRate 0.0156   Epoch: 12   Global Step: 150370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:57:34,111-Speed 3019.92 samples/sec   Loss 4.8473   LearningRate 0.0156   Epoch: 12   Global Step: 150380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:57:37,528-Speed 2998.31 samples/sec   Loss 4.8769   LearningRate 0.0156   Epoch: 12   Global Step: 150390   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:40,871-Speed 3063.81 samples/sec   Loss 4.7886   LearningRate 0.0156   Epoch: 12   Global Step: 150400   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:44,269-Speed 3015.31 samples/sec   Loss 4.7652   LearningRate 0.0156   Epoch: 12   Global Step: 150410   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:47,695-Speed 2989.35 samples/sec   Loss 4.7927   LearningRate 0.0156   Epoch: 12   Global Step: 150420   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:51,053-Speed 3050.75 samples/sec   Loss 4.7463   LearningRate 0.0156   Epoch: 12   Global Step: 150430   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:54,448-Speed 3016.80 samples/sec   Loss 4.8488   LearningRate 0.0156   Epoch: 12   Global Step: 150440   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:57:57,799-Speed 3056.70 samples/sec   Loss 4.8110   LearningRate 0.0156   Epoch: 12   Global Step: 150450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:58:01,272-Speed 2949.31 samples/sec   Loss 4.8572   LearningRate 0.0155   Epoch: 12   Global Step: 150460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:58:04,595-Speed 3082.54 samples/sec   Loss 4.8963   LearningRate 0.0155   Epoch: 12   Global Step: 150470   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:58:08,026-Speed 2985.59 samples/sec   Loss 4.8960   LearningRate 0.0155   Epoch: 12   Global Step: 150480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 15:58:11,420-Speed 3017.42 samples/sec   Loss 4.9173   LearningRate 0.0155   Epoch: 12   Global Step: 150490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:58:14,766-Speed 3061.72 samples/sec   Loss 4.7969   LearningRate 0.0155   Epoch: 12   Global Step: 150500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:58:18,177-Speed 3002.67 samples/sec   Loss 4.8782   LearningRate 0.0155   Epoch: 12   Global Step: 150510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:58:21,615-Speed 2979.92 samples/sec   Loss 4.8465   LearningRate 0.0155   Epoch: 12   Global Step: 150520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:58:25,068-Speed 2965.96 samples/sec   Loss 4.8579   LearningRate 0.0155   Epoch: 12   Global Step: 150530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:58:28,562-Speed 2932.23 samples/sec   Loss 4.7910   LearningRate 0.0155   Epoch: 12   Global Step: 150540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:58:31,938-Speed 3033.85 samples/sec   Loss 4.8378   LearningRate 0.0155   Epoch: 12   Global Step: 150550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:58:35,268-Speed 3075.24 samples/sec   Loss 4.9767   LearningRate 0.0155   Epoch: 12   Global Step: 150560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:58:38,720-Speed 2967.77 samples/sec   Loss 4.9708   LearningRate 0.0155   Epoch: 12   Global Step: 150570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:58:42,127-Speed 3006.35 samples/sec   Loss 4.9609   LearningRate 0.0155   Epoch: 12   Global Step: 150580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:58:45,536-Speed 3004.14 samples/sec   Loss 4.9618   LearningRate 0.0155   Epoch: 12   Global Step: 150590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:58:48,907-Speed 3039.24 samples/sec   Loss 4.8980   LearningRate 0.0155   Epoch: 12   Global Step: 150600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:58:52,270-Speed 3045.76 samples/sec   Loss 4.8709   LearningRate 0.0155   Epoch: 12   Global Step: 150610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:58:55,708-Speed 2979.05 samples/sec   Loss 4.8242   LearningRate 0.0155   Epoch: 12   Global Step: 150620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:58:59,057-Speed 3058.85 samples/sec   Loss 4.8885   LearningRate 0.0155   Epoch: 12   Global Step: 150630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:59:02,389-Speed 3073.35 samples/sec   Loss 4.9063   LearningRate 0.0155   Epoch: 12   Global Step: 150640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:59:05,713-Speed 3081.81 samples/sec   Loss 4.9184   LearningRate 0.0155   Epoch: 12   Global Step: 150650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:59:09,140-Speed 2988.58 samples/sec   Loss 4.7782   LearningRate 0.0155   Epoch: 12   Global Step: 150660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:59:12,502-Speed 3046.79 samples/sec   Loss 4.9602   LearningRate 0.0155   Epoch: 12   Global Step: 150670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:59:15,871-Speed 3040.44 samples/sec   Loss 4.9071   LearningRate 0.0155   Epoch: 12   Global Step: 150680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:59:19,224-Speed 3055.03 samples/sec   Loss 4.8558   LearningRate 0.0155   Epoch: 12   Global Step: 150690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:59:22,556-Speed 3073.97 samples/sec   Loss 5.0201   LearningRate 0.0155   Epoch: 12   Global Step: 150700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:59:25,869-Speed 3091.51 samples/sec   Loss 4.8942   LearningRate 0.0155   Epoch: 12   Global Step: 150710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:59:29,277-Speed 3006.20 samples/sec   Loss 4.9205   LearningRate 0.0155   Epoch: 12   Global Step: 150720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:59:32,652-Speed 3035.00 samples/sec   Loss 4.9435   LearningRate 0.0155   Epoch: 12   Global Step: 150730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:59:35,995-Speed 3063.29 samples/sec   Loss 4.9426   LearningRate 0.0155   Epoch: 12   Global Step: 150740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:59:39,379-Speed 3027.07 samples/sec   Loss 4.8783   LearningRate 0.0155   Epoch: 12   Global Step: 150750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 15:59:42,706-Speed 3079.37 samples/sec   Loss 4.8054   LearningRate 0.0155   Epoch: 12   Global Step: 150760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:59:46,045-Speed 3067.02 samples/sec   Loss 4.9613   LearningRate 0.0155   Epoch: 12   Global Step: 150770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:59:49,447-Speed 3010.31 samples/sec   Loss 4.9745   LearningRate 0.0154   Epoch: 12   Global Step: 150780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:59:52,828-Speed 3029.70 samples/sec   Loss 4.9459   LearningRate 0.0154   Epoch: 12   Global Step: 150790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:59:56,161-Speed 3074.03 samples/sec   Loss 5.0065   LearningRate 0.0154   Epoch: 12   Global Step: 150800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 15:59:59,511-Speed 3057.15 samples/sec   Loss 4.9027   LearningRate 0.0154   Epoch: 12   Global Step: 150810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:00:02,883-Speed 3037.63 samples/sec   Loss 5.0000   LearningRate 0.0154   Epoch: 12   Global Step: 150820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:00:06,267-Speed 3026.71 samples/sec   Loss 4.8740   LearningRate 0.0154   Epoch: 12   Global Step: 150830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:00:09,624-Speed 3051.96 samples/sec   Loss 4.9721   LearningRate 0.0154   Epoch: 12   Global Step: 150840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:00:13,054-Speed 2986.15 samples/sec   Loss 4.9701   LearningRate 0.0154   Epoch: 12   Global Step: 150850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:00:16,422-Speed 3040.59 samples/sec   Loss 4.8777   LearningRate 0.0154   Epoch: 12   Global Step: 150860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:00:19,777-Speed 3054.58 samples/sec   Loss 4.9775   LearningRate 0.0154   Epoch: 12   Global Step: 150870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:00:23,100-Speed 3082.53 samples/sec   Loss 4.8554   LearningRate 0.0154   Epoch: 12   Global Step: 150880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:00:26,414-Speed 3090.39 samples/sec   Loss 4.9251   LearningRate 0.0154   Epoch: 12   Global Step: 150890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:00:29,847-Speed 2983.43 samples/sec   Loss 4.9091   LearningRate 0.0154   Epoch: 12   Global Step: 150900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:00:33,191-Speed 3063.26 samples/sec   Loss 5.0354   LearningRate 0.0154   Epoch: 12   Global Step: 150910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:00:36,566-Speed 3034.74 samples/sec   Loss 5.0130   LearningRate 0.0154   Epoch: 12   Global Step: 150920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:00:39,991-Speed 2991.09 samples/sec   Loss 5.0224   LearningRate 0.0154   Epoch: 12   Global Step: 150930   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:00:43,374-Speed 3027.55 samples/sec   Loss 4.8984   LearningRate 0.0154   Epoch: 12   Global Step: 150940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:00:46,740-Speed 3043.07 samples/sec   Loss 4.9431   LearningRate 0.0154   Epoch: 12   Global Step: 150950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:00:50,131-Speed 3020.89 samples/sec   Loss 4.8955   LearningRate 0.0154   Epoch: 12   Global Step: 150960   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:00:53,477-Speed 3060.51 samples/sec   Loss 4.9414   LearningRate 0.0154   Epoch: 12   Global Step: 150970   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:00:56,822-Speed 3061.91 samples/sec   Loss 4.9243   LearningRate 0.0154   Epoch: 12   Global Step: 150980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:01:00,262-Speed 2978.10 samples/sec   Loss 4.9233   LearningRate 0.0154   Epoch: 12   Global Step: 150990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:01:03,666-Speed 3008.61 samples/sec   Loss 4.8830   LearningRate 0.0154   Epoch: 12   Global Step: 151000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:01:07,204-Speed 2895.26 samples/sec   Loss 4.9302   LearningRate 0.0154   Epoch: 12   Global Step: 151010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:01:10,642-Speed 2979.53 samples/sec   Loss 5.0004   LearningRate 0.0154   Epoch: 12   Global Step: 151020   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:01:14,054-Speed 3001.73 samples/sec   Loss 4.9736   LearningRate 0.0154   Epoch: 12   Global Step: 151030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:01:17,405-Speed 3056.97 samples/sec   Loss 4.8737   LearningRate 0.0154   Epoch: 12   Global Step: 151040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:01:20,843-Speed 2979.64 samples/sec   Loss 4.9121   LearningRate 0.0154   Epoch: 12   Global Step: 151050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:01:24,182-Speed 3067.04 samples/sec   Loss 4.9650   LearningRate 0.0154   Epoch: 12   Global Step: 151060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:01:27,548-Speed 3043.41 samples/sec   Loss 4.9875   LearningRate 0.0154   Epoch: 12   Global Step: 151070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:01:30,950-Speed 3010.75 samples/sec   Loss 5.0377   LearningRate 0.0154   Epoch: 12   Global Step: 151080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:01:34,357-Speed 3006.51 samples/sec   Loss 4.9522   LearningRate 0.0154   Epoch: 12   Global Step: 151090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:01:37,735-Speed 3032.03 samples/sec   Loss 5.0311   LearningRate 0.0153   Epoch: 12   Global Step: 151100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:01:41,097-Speed 3047.41 samples/sec   Loss 4.9596   LearningRate 0.0153   Epoch: 12   Global Step: 151110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:01:44,486-Speed 3022.37 samples/sec   Loss 5.0041   LearningRate 0.0153   Epoch: 12   Global Step: 151120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:01:47,879-Speed 3019.09 samples/sec   Loss 5.0109   LearningRate 0.0153   Epoch: 12   Global Step: 151130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:01:51,227-Speed 3059.30 samples/sec   Loss 4.9691   LearningRate 0.0153   Epoch: 12   Global Step: 151140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:01:54,623-Speed 3015.71 samples/sec   Loss 5.0395   LearningRate 0.0153   Epoch: 12   Global Step: 151150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:01:58,012-Speed 3022.91 samples/sec   Loss 5.0352   LearningRate 0.0153   Epoch: 12   Global Step: 151160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:02:01,351-Speed 3067.01 samples/sec   Loss 5.0111   LearningRate 0.0153   Epoch: 12   Global Step: 151170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:02:04,729-Speed 3032.62 samples/sec   Loss 4.8391   LearningRate 0.0153   Epoch: 12   Global Step: 151180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:02:08,104-Speed 3034.94 samples/sec   Loss 4.9195   LearningRate 0.0153   Epoch: 12   Global Step: 151190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:02:11,503-Speed 3013.34 samples/sec   Loss 5.0974   LearningRate 0.0153   Epoch: 12   Global Step: 151200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:02:14,932-Speed 2987.60 samples/sec   Loss 5.0485   LearningRate 0.0153   Epoch: 12   Global Step: 151210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:02:18,308-Speed 3034.38 samples/sec   Loss 4.9571   LearningRate 0.0153   Epoch: 12   Global Step: 151220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:02:21,711-Speed 3009.71 samples/sec   Loss 5.0171   LearningRate 0.0153   Epoch: 12   Global Step: 151230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:02:25,117-Speed 3007.10 samples/sec   Loss 4.9251   LearningRate 0.0153   Epoch: 12   Global Step: 151240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:02:28,543-Speed 2989.59 samples/sec   Loss 5.0288   LearningRate 0.0153   Epoch: 12   Global Step: 151250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:02:31,966-Speed 2992.44 samples/sec   Loss 5.0384   LearningRate 0.0153   Epoch: 12   Global Step: 151260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:02:35,338-Speed 3038.18 samples/sec   Loss 5.0391   LearningRate 0.0153   Epoch: 12   Global Step: 151270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:02:38,717-Speed 3031.30 samples/sec   Loss 4.9884   LearningRate 0.0153   Epoch: 12   Global Step: 151280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:02:42,049-Speed 3074.30 samples/sec   Loss 5.1616   LearningRate 0.0153   Epoch: 12   Global Step: 151290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:02:45,422-Speed 3036.81 samples/sec   Loss 4.9656   LearningRate 0.0153   Epoch: 12   Global Step: 151300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:02:48,745-Speed 3082.82 samples/sec   Loss 4.9075   LearningRate 0.0153   Epoch: 12   Global Step: 151310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:02:52,120-Speed 3034.59 samples/sec   Loss 5.0633   LearningRate 0.0153   Epoch: 12   Global Step: 151320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:02:55,549-Speed 2987.01 samples/sec   Loss 5.0096   LearningRate 0.0153   Epoch: 12   Global Step: 151330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:02:58,905-Speed 3051.75 samples/sec   Loss 5.0181   LearningRate 0.0153   Epoch: 12   Global Step: 151340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:03:02,250-Speed 3062.59 samples/sec   Loss 4.8715   LearningRate 0.0153   Epoch: 12   Global Step: 151350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:03:05,626-Speed 3033.57 samples/sec   Loss 5.0671   LearningRate 0.0153   Epoch: 12   Global Step: 151360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:03:09,083-Speed 2963.75 samples/sec   Loss 4.9997   LearningRate 0.0153   Epoch: 12   Global Step: 151370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:03:12,532-Speed 2969.21 samples/sec   Loss 4.9874   LearningRate 0.0153   Epoch: 12   Global Step: 151380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:03:15,935-Speed 3009.62 samples/sec   Loss 5.0667   LearningRate 0.0153   Epoch: 12   Global Step: 151390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:03:19,326-Speed 3021.06 samples/sec   Loss 5.0502   LearningRate 0.0153   Epoch: 12   Global Step: 151400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:03:22,669-Speed 3064.36 samples/sec   Loss 5.0415   LearningRate 0.0152   Epoch: 12   Global Step: 151410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:03:26,070-Speed 3012.23 samples/sec   Loss 4.9245   LearningRate 0.0152   Epoch: 12   Global Step: 151420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:03:29,440-Speed 3038.80 samples/sec   Loss 5.0822   LearningRate 0.0152   Epoch: 12   Global Step: 151430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:03:32,806-Speed 3042.94 samples/sec   Loss 5.1016   LearningRate 0.0152   Epoch: 12   Global Step: 151440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:03:36,199-Speed 3018.55 samples/sec   Loss 5.1217   LearningRate 0.0152   Epoch: 12   Global Step: 151450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:03:39,629-Speed 2986.63 samples/sec   Loss 4.9793   LearningRate 0.0152   Epoch: 12   Global Step: 151460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:03:42,980-Speed 3057.32 samples/sec   Loss 4.9662   LearningRate 0.0152   Epoch: 12   Global Step: 151470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:03:46,317-Speed 3069.60 samples/sec   Loss 5.0255   LearningRate 0.0152   Epoch: 12   Global Step: 151480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:03:49,767-Speed 2968.51 samples/sec   Loss 5.1253   LearningRate 0.0152   Epoch: 12   Global Step: 151490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:03:53,167-Speed 3013.18 samples/sec   Loss 5.0001   LearningRate 0.0152   Epoch: 12   Global Step: 151500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:03:56,599-Speed 2984.77 samples/sec   Loss 5.0568   LearningRate 0.0152   Epoch: 12   Global Step: 151510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:03:59,970-Speed 3037.84 samples/sec   Loss 4.9623   LearningRate 0.0152   Epoch: 12   Global Step: 151520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:04:03,365-Speed 3017.75 samples/sec   Loss 5.0699   LearningRate 0.0152   Epoch: 12   Global Step: 151530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:04:06,726-Speed 3047.21 samples/sec   Loss 5.1189   LearningRate 0.0152   Epoch: 12   Global Step: 151540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:04:10,086-Speed 3048.09 samples/sec   Loss 5.0388   LearningRate 0.0152   Epoch: 12   Global Step: 151550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:04:13,451-Speed 3044.49 samples/sec   Loss 5.0754   LearningRate 0.0152   Epoch: 12   Global Step: 151560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:04:16,798-Speed 3060.27 samples/sec   Loss 5.1318   LearningRate 0.0152   Epoch: 12   Global Step: 151570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:04:20,132-Speed 3072.45 samples/sec   Loss 5.0405   LearningRate 0.0152   Epoch: 12   Global Step: 151580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:04:23,533-Speed 3011.51 samples/sec   Loss 5.0650   LearningRate 0.0152   Epoch: 12   Global Step: 151590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:04:26,901-Speed 3041.38 samples/sec   Loss 5.1415   LearningRate 0.0152   Epoch: 12   Global Step: 151600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:04:30,337-Speed 2980.81 samples/sec   Loss 5.0298   LearningRate 0.0152   Epoch: 12   Global Step: 151610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:04:33,725-Speed 3023.29 samples/sec   Loss 5.0683   LearningRate 0.0152   Epoch: 12   Global Step: 151620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:04:37,104-Speed 3031.22 samples/sec   Loss 4.9996   LearningRate 0.0152   Epoch: 12   Global Step: 151630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:04:40,519-Speed 3000.21 samples/sec   Loss 5.1674   LearningRate 0.0152   Epoch: 12   Global Step: 151640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:04:43,893-Speed 3035.36 samples/sec   Loss 5.1425   LearningRate 0.0152   Epoch: 12   Global Step: 151650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:04:47,333-Speed 2977.77 samples/sec   Loss 5.0090   LearningRate 0.0152   Epoch: 12   Global Step: 151660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:04:50,663-Speed 3075.79 samples/sec   Loss 5.1261   LearningRate 0.0152   Epoch: 12   Global Step: 151670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:04:53,981-Speed 3087.05 samples/sec   Loss 5.1155   LearningRate 0.0152   Epoch: 12   Global Step: 151680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:04:57,316-Speed 3071.44 samples/sec   Loss 5.0451   LearningRate 0.0152   Epoch: 12   Global Step: 151690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:00,731-Speed 2999.48 samples/sec   Loss 5.0666   LearningRate 0.0152   Epoch: 12   Global Step: 151700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:04,151-Speed 2994.87 samples/sec   Loss 5.0735   LearningRate 0.0152   Epoch: 12   Global Step: 151710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:07,549-Speed 3014.14 samples/sec   Loss 5.0345   LearningRate 0.0152   Epoch: 12   Global Step: 151720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:10,994-Speed 2973.84 samples/sec   Loss 5.1010   LearningRate 0.0151   Epoch: 12   Global Step: 151730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:14,440-Speed 2972.14 samples/sec   Loss 5.0865   LearningRate 0.0151   Epoch: 12   Global Step: 151740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:17,769-Speed 3077.04 samples/sec   Loss 5.0921   LearningRate 0.0151   Epoch: 12   Global Step: 151750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:21,096-Speed 3078.46 samples/sec   Loss 5.0002   LearningRate 0.0151   Epoch: 12   Global Step: 151760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:24,406-Speed 3094.62 samples/sec   Loss 5.0672   LearningRate 0.0151   Epoch: 12   Global Step: 151770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:27,824-Speed 2997.34 samples/sec   Loss 5.0684   LearningRate 0.0151   Epoch: 12   Global Step: 151780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:31,178-Speed 3054.10 samples/sec   Loss 5.0434   LearningRate 0.0151   Epoch: 12   Global Step: 151790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:34,573-Speed 3016.67 samples/sec   Loss 5.1084   LearningRate 0.0151   Epoch: 12   Global Step: 151800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:38,038-Speed 2956.65 samples/sec   Loss 5.1459   LearningRate 0.0151   Epoch: 12   Global Step: 151810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:41,404-Speed 3043.69 samples/sec   Loss 5.1703   LearningRate 0.0151   Epoch: 12   Global Step: 151820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:44,925-Speed 2908.99 samples/sec   Loss 5.0737   LearningRate 0.0151   Epoch: 12   Global Step: 151830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:05:48,284-Speed 3049.79 samples/sec   Loss 5.1221   LearningRate 0.0151   Epoch: 12   Global Step: 151840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:05:51,638-Speed 3054.05 samples/sec   Loss 5.1147   LearningRate 0.0151   Epoch: 12   Global Step: 151850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:05:55,068-Speed 2985.86 samples/sec   Loss 4.9785   LearningRate 0.0151   Epoch: 12   Global Step: 151860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:05:58,503-Speed 2982.18 samples/sec   Loss 5.0831   LearningRate 0.0151   Epoch: 12   Global Step: 151870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:06:01,933-Speed 2986.76 samples/sec   Loss 5.1992   LearningRate 0.0151   Epoch: 12   Global Step: 151880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:06:05,335-Speed 3010.02 samples/sec   Loss 5.1849   LearningRate 0.0151   Epoch: 12   Global Step: 151890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:06:08,726-Speed 3021.16 samples/sec   Loss 5.1178   LearningRate 0.0151   Epoch: 12   Global Step: 151900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:06:12,092-Speed 3042.92 samples/sec   Loss 5.0604   LearningRate 0.0151   Epoch: 12   Global Step: 151910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:06:15,449-Speed 3051.87 samples/sec   Loss 5.1513   LearningRate 0.0151   Epoch: 12   Global Step: 151920   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:06:18,835-Speed 3025.19 samples/sec   Loss 5.1971   LearningRate 0.0151   Epoch: 12   Global Step: 151930   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:06:22,166-Speed 3074.62 samples/sec   Loss 5.1637   LearningRate 0.0151   Epoch: 12   Global Step: 151940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:06:25,535-Speed 3040.16 samples/sec   Loss 5.1030   LearningRate 0.0151   Epoch: 12   Global Step: 151950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:06:28,903-Speed 3041.92 samples/sec   Loss 5.1954   LearningRate 0.0151   Epoch: 12   Global Step: 151960   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:06:32,310-Speed 3006.09 samples/sec   Loss 5.1528   LearningRate 0.0151   Epoch: 12   Global Step: 151970   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:06:35,663-Speed 3054.65 samples/sec   Loss 5.1156   LearningRate 0.0151   Epoch: 12   Global Step: 151980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:06:39,120-Speed 2962.81 samples/sec   Loss 5.0520   LearningRate 0.0151   Epoch: 12   Global Step: 151990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:06:42,431-Speed 3094.11 samples/sec   Loss 5.1104   LearningRate 0.0151   Epoch: 12   Global Step: 152000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:06:45,789-Speed 3049.78 samples/sec   Loss 5.1182   LearningRate 0.0151   Epoch: 12   Global Step: 152010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:06:49,152-Speed 3046.21 samples/sec   Loss 5.0768   LearningRate 0.0151   Epoch: 12   Global Step: 152020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:06:52,569-Speed 2997.45 samples/sec   Loss 5.1648   LearningRate 0.0151   Epoch: 12   Global Step: 152030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:06:55,984-Speed 2999.19 samples/sec   Loss 5.0954   LearningRate 0.0151   Epoch: 12   Global Step: 152040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:06:59,370-Speed 3024.89 samples/sec   Loss 5.1880   LearningRate 0.0150   Epoch: 12   Global Step: 152050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:07:02,714-Speed 3062.88 samples/sec   Loss 5.0559   LearningRate 0.0150   Epoch: 12   Global Step: 152060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:07:06,019-Speed 3099.44 samples/sec   Loss 5.1366   LearningRate 0.0150   Epoch: 12   Global Step: 152070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:07:09,355-Speed 3070.24 samples/sec   Loss 5.1298   LearningRate 0.0150   Epoch: 12   Global Step: 152080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:07:12,752-Speed 3015.47 samples/sec   Loss 5.1377   LearningRate 0.0150   Epoch: 12   Global Step: 152090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:07:16,069-Speed 3087.59 samples/sec   Loss 5.1694   LearningRate 0.0150   Epoch: 12   Global Step: 152100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:07:19,468-Speed 3013.73 samples/sec   Loss 5.2600   LearningRate 0.0150   Epoch: 12   Global Step: 152110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:07:22,811-Speed 3064.39 samples/sec   Loss 5.1525   LearningRate 0.0150   Epoch: 12   Global Step: 152120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:07:26,134-Speed 3082.93 samples/sec   Loss 5.1059   LearningRate 0.0150   Epoch: 12   Global Step: 152130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:07:29,475-Speed 3066.15 samples/sec   Loss 5.1483   LearningRate 0.0150   Epoch: 12   Global Step: 152140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:07:32,799-Speed 3081.24 samples/sec   Loss 5.1613   LearningRate 0.0150   Epoch: 12   Global Step: 152150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:07:36,172-Speed 3036.16 samples/sec   Loss 5.1675   LearningRate 0.0150   Epoch: 12   Global Step: 152160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:07:39,507-Speed 3071.73 samples/sec   Loss 5.1500   LearningRate 0.0150   Epoch: 12   Global Step: 152170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:07:42,864-Speed 3051.58 samples/sec   Loss 5.0720   LearningRate 0.0150   Epoch: 12   Global Step: 152180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:07:46,208-Speed 3062.65 samples/sec   Loss 5.1058   LearningRate 0.0150   Epoch: 12   Global Step: 152190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:07:49,527-Speed 3086.61 samples/sec   Loss 5.1202   LearningRate 0.0150   Epoch: 12   Global Step: 152200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:07:52,915-Speed 3023.80 samples/sec   Loss 5.1170   LearningRate 0.0150   Epoch: 12   Global Step: 152210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:07:56,267-Speed 3055.76 samples/sec   Loss 5.1478   LearningRate 0.0150   Epoch: 12   Global Step: 152220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:07:59,656-Speed 3022.20 samples/sec   Loss 5.2333   LearningRate 0.0150   Epoch: 12   Global Step: 152230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:08:03,083-Speed 2989.15 samples/sec   Loss 5.1223   LearningRate 0.0150   Epoch: 12   Global Step: 152240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:08:06,431-Speed 3059.43 samples/sec   Loss 5.2112   LearningRate 0.0150   Epoch: 12   Global Step: 152250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:08:09,877-Speed 2973.03 samples/sec   Loss 5.1919   LearningRate 0.0150   Epoch: 12   Global Step: 152260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:08:13,273-Speed 3016.29 samples/sec   Loss 5.0476   LearningRate 0.0150   Epoch: 12   Global Step: 152270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:08:16,667-Speed 3017.88 samples/sec   Loss 5.1850   LearningRate 0.0150   Epoch: 12   Global Step: 152280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:08:20,048-Speed 3029.34 samples/sec   Loss 5.2112   LearningRate 0.0150   Epoch: 12   Global Step: 152290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:08:23,443-Speed 3016.92 samples/sec   Loss 5.1935   LearningRate 0.0150   Epoch: 12   Global Step: 152300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:08:26,780-Speed 3069.56 samples/sec   Loss 5.0629   LearningRate 0.0150   Epoch: 12   Global Step: 152310   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:08:30,132-Speed 3056.20 samples/sec   Loss 5.1560   LearningRate 0.0150   Epoch: 12   Global Step: 152320   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:08:33,547-Speed 2999.32 samples/sec   Loss 5.1293   LearningRate 0.0150   Epoch: 12   Global Step: 152330   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:08:36,918-Speed 3038.03 samples/sec   Loss 5.1900   LearningRate 0.0150   Epoch: 12   Global Step: 152340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:08:40,296-Speed 3032.51 samples/sec   Loss 5.2122   LearningRate 0.0150   Epoch: 12   Global Step: 152350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:08:43,614-Speed 3086.81 samples/sec   Loss 5.2495   LearningRate 0.0150   Epoch: 12   Global Step: 152360   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:08:47,031-Speed 2997.45 samples/sec   Loss 5.1843   LearningRate 0.0149   Epoch: 12   Global Step: 152370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:08:50,438-Speed 3006.32 samples/sec   Loss 5.1936   LearningRate 0.0149   Epoch: 12   Global Step: 152380   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:08:53,793-Speed 3053.24 samples/sec   Loss 5.2251   LearningRate 0.0149   Epoch: 12   Global Step: 152390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:08:57,284-Speed 2933.66 samples/sec   Loss 5.2227   LearningRate 0.0149   Epoch: 12   Global Step: 152400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:00,699-Speed 3000.51 samples/sec   Loss 5.2477   LearningRate 0.0149   Epoch: 12   Global Step: 152410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:04,084-Speed 3025.07 samples/sec   Loss 5.1361   LearningRate 0.0149   Epoch: 12   Global Step: 152420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:07,491-Speed 3006.75 samples/sec   Loss 5.1330   LearningRate 0.0149   Epoch: 12   Global Step: 152430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:10,967-Speed 2946.68 samples/sec   Loss 5.0906   LearningRate 0.0149   Epoch: 12   Global Step: 152440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:14,334-Speed 3041.97 samples/sec   Loss 5.1225   LearningRate 0.0149   Epoch: 12   Global Step: 152450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:17,699-Speed 3043.83 samples/sec   Loss 5.1148   LearningRate 0.0149   Epoch: 12   Global Step: 152460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:21,068-Speed 3041.02 samples/sec   Loss 5.1124   LearningRate 0.0149   Epoch: 12   Global Step: 152470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:24,441-Speed 3036.65 samples/sec   Loss 5.2021   LearningRate 0.0149   Epoch: 12   Global Step: 152480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:27,835-Speed 3018.20 samples/sec   Loss 5.2318   LearningRate 0.0149   Epoch: 12   Global Step: 152490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:31,177-Speed 3065.04 samples/sec   Loss 5.2234   LearningRate 0.0149   Epoch: 12   Global Step: 152500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:34,524-Speed 3060.40 samples/sec   Loss 5.2133   LearningRate 0.0149   Epoch: 12   Global Step: 152510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:38,001-Speed 2945.14 samples/sec   Loss 5.1605   LearningRate 0.0149   Epoch: 12   Global Step: 152520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:41,353-Speed 3056.14 samples/sec   Loss 5.1429   LearningRate 0.0149   Epoch: 12   Global Step: 152530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:44,822-Speed 2952.45 samples/sec   Loss 5.2875   LearningRate 0.0149   Epoch: 12   Global Step: 152540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:48,268-Speed 2972.46 samples/sec   Loss 5.2317   LearningRate 0.0149   Epoch: 12   Global Step: 152550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:51,663-Speed 3017.35 samples/sec   Loss 5.2172   LearningRate 0.0149   Epoch: 12   Global Step: 152560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:55,081-Speed 2996.45 samples/sec   Loss 5.2819   LearningRate 0.0149   Epoch: 12   Global Step: 152570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:09:58,508-Speed 2988.84 samples/sec   Loss 5.2377   LearningRate 0.0149   Epoch: 12   Global Step: 152580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:10:01,901-Speed 3019.11 samples/sec   Loss 5.1127   LearningRate 0.0149   Epoch: 12   Global Step: 152590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:10:05,276-Speed 3035.00 samples/sec   Loss 5.0997   LearningRate 0.0149   Epoch: 12   Global Step: 152600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:10:08,580-Speed 3100.29 samples/sec   Loss 5.2472   LearningRate 0.0149   Epoch: 12   Global Step: 152610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:10:11,940-Speed 3048.56 samples/sec   Loss 5.2688   LearningRate 0.0149   Epoch: 12   Global Step: 152620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:10:15,305-Speed 3044.35 samples/sec   Loss 5.1226   LearningRate 0.0149   Epoch: 12   Global Step: 152630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:10:18,695-Speed 3021.58 samples/sec   Loss 5.2250   LearningRate 0.0149   Epoch: 12   Global Step: 152640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:10:22,139-Speed 2974.30 samples/sec   Loss 5.1946   LearningRate 0.0149   Epoch: 12   Global Step: 152650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:10:25,598-Speed 2961.64 samples/sec   Loss 5.2274   LearningRate 0.0149   Epoch: 12   Global Step: 152660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:10:29,025-Speed 2989.15 samples/sec   Loss 5.1132   LearningRate 0.0149   Epoch: 12   Global Step: 152670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:10:32,463-Speed 2979.08 samples/sec   Loss 5.1450   LearningRate 0.0149   Epoch: 12   Global Step: 152680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:10:35,779-Speed 3088.61 samples/sec   Loss 5.1641   LearningRate 0.0148   Epoch: 12   Global Step: 152690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:10:39,183-Speed 3009.24 samples/sec   Loss 5.2693   LearningRate 0.0148   Epoch: 12   Global Step: 152700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:10:42,582-Speed 3013.60 samples/sec   Loss 5.1750   LearningRate 0.0148   Epoch: 12   Global Step: 152710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:10:45,997-Speed 2998.90 samples/sec   Loss 5.1957   LearningRate 0.0148   Epoch: 12   Global Step: 152720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:10:49,407-Speed 3004.46 samples/sec   Loss 5.2999   LearningRate 0.0148   Epoch: 12   Global Step: 152730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:10:52,843-Speed 2981.42 samples/sec   Loss 5.1834   LearningRate 0.0148   Epoch: 12   Global Step: 152740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:10:56,303-Speed 2960.23 samples/sec   Loss 5.1568   LearningRate 0.0148   Epoch: 12   Global Step: 152750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:10:59,783-Speed 2943.65 samples/sec   Loss 5.2790   LearningRate 0.0148   Epoch: 12   Global Step: 152760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:11:03,185-Speed 3010.74 samples/sec   Loss 5.2034   LearningRate 0.0148   Epoch: 12   Global Step: 152770   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:11:06,572-Speed 3024.27 samples/sec   Loss 5.1526   LearningRate 0.0148   Epoch: 12   Global Step: 152780   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:11:09,953-Speed 3029.63 samples/sec   Loss 5.2297   LearningRate 0.0148   Epoch: 12   Global Step: 152790   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:11:13,418-Speed 2955.79 samples/sec   Loss 5.2612   LearningRate 0.0148   Epoch: 12   Global Step: 152800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:11:16,853-Speed 2982.10 samples/sec   Loss 5.2141   LearningRate 0.0148   Epoch: 12   Global Step: 152810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:11:20,187-Speed 3072.51 samples/sec   Loss 5.1670   LearningRate 0.0148   Epoch: 12   Global Step: 152820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:11:23,557-Speed 3038.79 samples/sec   Loss 5.1345   LearningRate 0.0148   Epoch: 12   Global Step: 152830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:11:26,975-Speed 2997.05 samples/sec   Loss 5.2195   LearningRate 0.0148   Epoch: 12   Global Step: 152840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:11:30,354-Speed 3030.82 samples/sec   Loss 5.1443   LearningRate 0.0148   Epoch: 12   Global Step: 152850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:11:33,780-Speed 2990.06 samples/sec   Loss 5.2217   LearningRate 0.0148   Epoch: 12   Global Step: 152860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:11:37,239-Speed 2961.26 samples/sec   Loss 5.2506   LearningRate 0.0148   Epoch: 12   Global Step: 152870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:11:40,697-Speed 2962.22 samples/sec   Loss 5.2700   LearningRate 0.0148   Epoch: 12   Global Step: 152880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:11:44,121-Speed 2991.42 samples/sec   Loss 5.3004   LearningRate 0.0148   Epoch: 12   Global Step: 152890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:11:47,557-Speed 2981.33 samples/sec   Loss 5.2459   LearningRate 0.0148   Epoch: 12   Global Step: 152900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:11:50,947-Speed 3021.15 samples/sec   Loss 5.2789   LearningRate 0.0148   Epoch: 12   Global Step: 152910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:11:54,258-Speed 3093.99 samples/sec   Loss 5.1893   LearningRate 0.0148   Epoch: 12   Global Step: 152920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:11:57,637-Speed 3030.71 samples/sec   Loss 5.2101   LearningRate 0.0148   Epoch: 12   Global Step: 152930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:12:01,058-Speed 2994.42 samples/sec   Loss 5.2643   LearningRate 0.0148   Epoch: 12   Global Step: 152940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:12:04,448-Speed 3021.64 samples/sec   Loss 5.2895   LearningRate 0.0148   Epoch: 12   Global Step: 152950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:12:07,806-Speed 3050.32 samples/sec   Loss 5.1651   LearningRate 0.0148   Epoch: 12   Global Step: 152960   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:12:11,142-Speed 3070.56 samples/sec   Loss 5.2924   LearningRate 0.0148   Epoch: 12   Global Step: 152970   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:12:14,548-Speed 3007.99 samples/sec   Loss 5.2900   LearningRate 0.0148   Epoch: 12   Global Step: 152980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:12:17,893-Speed 3061.90 samples/sec   Loss 5.2401   LearningRate 0.0148   Epoch: 12   Global Step: 152990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:12:21,303-Speed 3003.22 samples/sec   Loss 5.2493   LearningRate 0.0148   Epoch: 12   Global Step: 153000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:12:24,699-Speed 3016.81 samples/sec   Loss 5.2401   LearningRate 0.0148   Epoch: 12   Global Step: 153010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:12:28,040-Speed 3065.56 samples/sec   Loss 5.2475   LearningRate 0.0147   Epoch: 12   Global Step: 153020   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:12:31,423-Speed 3028.28 samples/sec   Loss 5.3219   LearningRate 0.0147   Epoch: 12   Global Step: 153030   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:12:34,827-Speed 3008.75 samples/sec   Loss 5.1939   LearningRate 0.0147   Epoch: 12   Global Step: 153040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:12:38,209-Speed 3028.52 samples/sec   Loss 5.2384   LearningRate 0.0147   Epoch: 12   Global Step: 153050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:12:41,529-Speed 3086.05 samples/sec   Loss 5.2652   LearningRate 0.0147   Epoch: 12   Global Step: 153060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:12:44,893-Speed 3045.16 samples/sec   Loss 5.1994   LearningRate 0.0147   Epoch: 12   Global Step: 153070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:12:48,234-Speed 3065.21 samples/sec   Loss 5.2190   LearningRate 0.0147   Epoch: 12   Global Step: 153080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:12:51,557-Speed 3082.85 samples/sec   Loss 5.1661   LearningRate 0.0147   Epoch: 12   Global Step: 153090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:12:54,959-Speed 3010.70 samples/sec   Loss 5.1818   LearningRate 0.0147   Epoch: 12   Global Step: 153100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:12:58,362-Speed 3010.05 samples/sec   Loss 5.3099   LearningRate 0.0147   Epoch: 12   Global Step: 153110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:13:01,693-Speed 3074.58 samples/sec   Loss 5.2622   LearningRate 0.0147   Epoch: 12   Global Step: 153120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:13:05,146-Speed 2966.49 samples/sec   Loss 5.2969   LearningRate 0.0147   Epoch: 12   Global Step: 153130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:13:08,522-Speed 3034.33 samples/sec   Loss 5.1938   LearningRate 0.0147   Epoch: 12   Global Step: 153140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:13:11,863-Speed 3066.32 samples/sec   Loss 5.3181   LearningRate 0.0147   Epoch: 12   Global Step: 153150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:13:15,232-Speed 3039.37 samples/sec   Loss 5.3494   LearningRate 0.0147   Epoch: 12   Global Step: 153160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:13:18,649-Speed 2998.12 samples/sec   Loss 5.3477   LearningRate 0.0147   Epoch: 12   Global Step: 153170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:13:22,032-Speed 3027.42 samples/sec   Loss 5.3240   LearningRate 0.0147   Epoch: 12   Global Step: 153180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:13:25,485-Speed 2966.66 samples/sec   Loss 5.2893   LearningRate 0.0147   Epoch: 12   Global Step: 153190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:13:28,885-Speed 3012.35 samples/sec   Loss 5.2061   LearningRate 0.0147   Epoch: 12   Global Step: 153200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:13:32,276-Speed 3021.10 samples/sec   Loss 5.3222   LearningRate 0.0147   Epoch: 12   Global Step: 153210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:13:35,665-Speed 3022.66 samples/sec   Loss 5.1959   LearningRate 0.0147   Epoch: 12   Global Step: 153220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:13:39,059-Speed 3017.73 samples/sec   Loss 5.2450   LearningRate 0.0147   Epoch: 12   Global Step: 153230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:13:42,539-Speed 2943.76 samples/sec   Loss 5.2239   LearningRate 0.0147   Epoch: 12   Global Step: 153240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:13:45,982-Speed 2974.85 samples/sec   Loss 5.3586   LearningRate 0.0147   Epoch: 12   Global Step: 153250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:13:49,388-Speed 3006.66 samples/sec   Loss 5.2730   LearningRate 0.0147   Epoch: 12   Global Step: 153260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:13:52,795-Speed 3006.83 samples/sec   Loss 5.2777   LearningRate 0.0147   Epoch: 12   Global Step: 153270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:13:56,181-Speed 3025.39 samples/sec   Loss 5.2561   LearningRate 0.0147   Epoch: 12   Global Step: 153280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:13:59,544-Speed 3045.56 samples/sec   Loss 5.2219   LearningRate 0.0147   Epoch: 12   Global Step: 153290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:03,043-Speed 2927.98 samples/sec   Loss 5.2718   LearningRate 0.0147   Epoch: 12   Global Step: 153300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:06,439-Speed 3015.86 samples/sec   Loss 5.2420   LearningRate 0.0147   Epoch: 12   Global Step: 153310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:09,830-Speed 3021.15 samples/sec   Loss 5.3430   LearningRate 0.0147   Epoch: 12   Global Step: 153320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:13,213-Speed 3027.87 samples/sec   Loss 5.1785   LearningRate 0.0147   Epoch: 12   Global Step: 153330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:16,606-Speed 3018.36 samples/sec   Loss 5.2391   LearningRate 0.0146   Epoch: 12   Global Step: 153340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:19,983-Speed 3033.37 samples/sec   Loss 5.3195   LearningRate 0.0146   Epoch: 12   Global Step: 153350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:23,440-Speed 2963.13 samples/sec   Loss 5.2948   LearningRate 0.0146   Epoch: 12   Global Step: 153360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:26,956-Speed 2913.69 samples/sec   Loss 5.3362   LearningRate 0.0146   Epoch: 12   Global Step: 153370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:30,335-Speed 3030.79 samples/sec   Loss 5.2261   LearningRate 0.0146   Epoch: 12   Global Step: 153380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:33,728-Speed 3019.03 samples/sec   Loss 5.2837   LearningRate 0.0146   Epoch: 12   Global Step: 153390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:37,123-Speed 3017.35 samples/sec   Loss 5.4372   LearningRate 0.0146   Epoch: 12   Global Step: 153400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:40,498-Speed 3035.41 samples/sec   Loss 5.3119   LearningRate 0.0146   Epoch: 12   Global Step: 153410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:43,856-Speed 3049.93 samples/sec   Loss 5.1542   LearningRate 0.0146   Epoch: 12   Global Step: 153420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:47,245-Speed 3022.51 samples/sec   Loss 5.2529   LearningRate 0.0146   Epoch: 12   Global Step: 153430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:50,602-Speed 3050.95 samples/sec   Loss 5.3586   LearningRate 0.0146   Epoch: 12   Global Step: 153440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:54,034-Speed 2984.59 samples/sec   Loss 5.2743   LearningRate 0.0146   Epoch: 12   Global Step: 153450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:14:57,418-Speed 3026.56 samples/sec   Loss 5.1915   LearningRate 0.0146   Epoch: 12   Global Step: 153460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:15:00,871-Speed 2967.06 samples/sec   Loss 5.1983   LearningRate 0.0146   Epoch: 12   Global Step: 153470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:15:04,336-Speed 2955.67 samples/sec   Loss 5.3701   LearningRate 0.0146   Epoch: 12   Global Step: 153480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:15:07,770-Speed 2983.51 samples/sec   Loss 5.2208   LearningRate 0.0146   Epoch: 12   Global Step: 153490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:15:11,154-Speed 3026.08 samples/sec   Loss 5.3429   LearningRate 0.0146   Epoch: 12   Global Step: 153500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:15:14,483-Speed 3077.46 samples/sec   Loss 5.2303   LearningRate 0.0146   Epoch: 12   Global Step: 153510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:15:17,867-Speed 3026.29 samples/sec   Loss 5.2841   LearningRate 0.0146   Epoch: 12   Global Step: 153520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:15:21,265-Speed 3014.53 samples/sec   Loss 5.3336   LearningRate 0.0146   Epoch: 12   Global Step: 153530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:15:24,591-Speed 3079.66 samples/sec   Loss 5.2474   LearningRate 0.0146   Epoch: 12   Global Step: 153540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:15:27,976-Speed 3026.05 samples/sec   Loss 5.2227   LearningRate 0.0146   Epoch: 12   Global Step: 153550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:15:31,355-Speed 3031.37 samples/sec   Loss 5.1919   LearningRate 0.0146   Epoch: 12   Global Step: 153560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:15:34,795-Speed 2978.23 samples/sec   Loss 5.3091   LearningRate 0.0146   Epoch: 12   Global Step: 153570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:15:38,185-Speed 3021.22 samples/sec   Loss 5.1683   LearningRate 0.0146   Epoch: 12   Global Step: 153580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:15:41,629-Speed 2974.11 samples/sec   Loss 5.3216   LearningRate 0.0146   Epoch: 12   Global Step: 153590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:15:45,011-Speed 3028.71 samples/sec   Loss 5.2533   LearningRate 0.0146   Epoch: 12   Global Step: 153600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:15:48,431-Speed 2994.66 samples/sec   Loss 5.3116   LearningRate 0.0146   Epoch: 12   Global Step: 153610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:15:51,757-Speed 3079.68 samples/sec   Loss 5.2103   LearningRate 0.0146   Epoch: 12   Global Step: 153620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:15:55,108-Speed 3057.36 samples/sec   Loss 5.2891   LearningRate 0.0146   Epoch: 12   Global Step: 153630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:15:58,532-Speed 2991.07 samples/sec   Loss 5.2788   LearningRate 0.0146   Epoch: 12   Global Step: 153640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:16:01,957-Speed 2991.10 samples/sec   Loss 5.2660   LearningRate 0.0146   Epoch: 12   Global Step: 153650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:16:05,350-Speed 3018.25 samples/sec   Loss 5.2187   LearningRate 0.0146   Epoch: 12   Global Step: 153660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:16:08,696-Speed 3061.60 samples/sec   Loss 5.3678   LearningRate 0.0145   Epoch: 12   Global Step: 153670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:16:12,088-Speed 3019.17 samples/sec   Loss 5.2426   LearningRate 0.0145   Epoch: 12   Global Step: 153680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:16:15,441-Speed 3055.54 samples/sec   Loss 5.3542   LearningRate 0.0145   Epoch: 12   Global Step: 153690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:16:18,853-Speed 3002.09 samples/sec   Loss 5.2923   LearningRate 0.0145   Epoch: 12   Global Step: 153700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:16:22,235-Speed 3027.65 samples/sec   Loss 5.2526   LearningRate 0.0145   Epoch: 12   Global Step: 153710   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:16:25,594-Speed 3049.89 samples/sec   Loss 5.3258   LearningRate 0.0145   Epoch: 12   Global Step: 153720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:16:28,978-Speed 3026.71 samples/sec   Loss 5.4100   LearningRate 0.0145   Epoch: 12   Global Step: 153730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:16:32,323-Speed 3062.69 samples/sec   Loss 5.3876   LearningRate 0.0145   Epoch: 12   Global Step: 153740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:16:35,760-Speed 2980.39 samples/sec   Loss 5.3215   LearningRate 0.0145   Epoch: 12   Global Step: 153750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:16:39,138-Speed 3031.88 samples/sec   Loss 5.2357   LearningRate 0.0145   Epoch: 12   Global Step: 153760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:16:42,583-Speed 2973.61 samples/sec   Loss 5.3083   LearningRate 0.0145   Epoch: 12   Global Step: 153770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:16:45,940-Speed 3051.01 samples/sec   Loss 5.2214   LearningRate 0.0145   Epoch: 12   Global Step: 153780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:16:49,284-Speed 3063.00 samples/sec   Loss 5.2850   LearningRate 0.0145   Epoch: 12   Global Step: 153790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:16:52,651-Speed 3043.39 samples/sec   Loss 5.3661   LearningRate 0.0145   Epoch: 12   Global Step: 153800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:16:55,995-Speed 3062.77 samples/sec   Loss 5.3382   LearningRate 0.0145   Epoch: 12   Global Step: 153810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:16:59,349-Speed 3053.43 samples/sec   Loss 5.3307   LearningRate 0.0145   Epoch: 12   Global Step: 153820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:17:02,714-Speed 3043.84 samples/sec   Loss 5.3103   LearningRate 0.0145   Epoch: 12   Global Step: 153830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:17:06,105-Speed 3021.21 samples/sec   Loss 5.2907   LearningRate 0.0145   Epoch: 12   Global Step: 153840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:17:09,474-Speed 3040.23 samples/sec   Loss 5.3660   LearningRate 0.0145   Epoch: 12   Global Step: 153850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:17:12,798-Speed 3081.41 samples/sec   Loss 5.3779   LearningRate 0.0145   Epoch: 12   Global Step: 153860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:17:16,162-Speed 3044.58 samples/sec   Loss 5.3071   LearningRate 0.0145   Epoch: 12   Global Step: 153870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:17:19,536-Speed 3036.53 samples/sec   Loss 5.3706   LearningRate 0.0145   Epoch: 12   Global Step: 153880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:17:22,907-Speed 3038.57 samples/sec   Loss 5.1883   LearningRate 0.0145   Epoch: 12   Global Step: 153890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:17:26,342-Speed 2981.68 samples/sec   Loss 5.3451   LearningRate 0.0145   Epoch: 12   Global Step: 153900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:17:29,729-Speed 3023.64 samples/sec   Loss 5.3852   LearningRate 0.0145   Epoch: 12   Global Step: 153910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:17:33,085-Speed 3052.04 samples/sec   Loss 5.3615   LearningRate 0.0145   Epoch: 12   Global Step: 153920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:17:37,104-Speed 2548.45 samples/sec   Loss 5.2620   LearningRate 0.0145   Epoch: 12   Global Step: 153930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:17:40,429-Speed 3080.71 samples/sec   Loss 5.2414   LearningRate 0.0145   Epoch: 12   Global Step: 153940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:17:43,755-Speed 3079.97 samples/sec   Loss 5.3087   LearningRate 0.0145   Epoch: 12   Global Step: 153950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:17:47,679-Speed 2609.77 samples/sec   Loss 5.3124   LearningRate 0.0145   Epoch: 12   Global Step: 153960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:17:51,053-Speed 3035.92 samples/sec   Loss 5.3817   LearningRate 0.0145   Epoch: 12   Global Step: 153970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:17:54,371-Speed 3087.66 samples/sec   Loss 5.3800   LearningRate 0.0145   Epoch: 12   Global Step: 153980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:17:59,009-Speed 2207.86 samples/sec   Loss 5.3013   LearningRate 0.0144   Epoch: 12   Global Step: 153990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:02,413-Speed 3009.45 samples/sec   Loss 5.3590   LearningRate 0.0144   Epoch: 12   Global Step: 154000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:05,828-Speed 2999.34 samples/sec   Loss 5.3691   LearningRate 0.0144   Epoch: 12   Global Step: 154010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:09,194-Speed 3042.63 samples/sec   Loss 5.4098   LearningRate 0.0144   Epoch: 12   Global Step: 154020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:12,602-Speed 3006.50 samples/sec   Loss 5.2483   LearningRate 0.0144   Epoch: 12   Global Step: 154030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:16,047-Speed 2972.87 samples/sec   Loss 5.2796   LearningRate 0.0144   Epoch: 12   Global Step: 154040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:19,449-Speed 3011.47 samples/sec   Loss 5.3007   LearningRate 0.0144   Epoch: 12   Global Step: 154050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:22,818-Speed 3040.25 samples/sec   Loss 5.4087   LearningRate 0.0144   Epoch: 12   Global Step: 154060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:26,171-Speed 3054.34 samples/sec   Loss 5.3442   LearningRate 0.0144   Epoch: 12   Global Step: 154070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:29,540-Speed 3040.82 samples/sec   Loss 5.4136   LearningRate 0.0144   Epoch: 12   Global Step: 154080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:32,928-Speed 3023.19 samples/sec   Loss 5.3086   LearningRate 0.0144   Epoch: 12   Global Step: 154090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:36,350-Speed 2992.53 samples/sec   Loss 5.3892   LearningRate 0.0144   Epoch: 12   Global Step: 154100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:39,690-Speed 3067.04 samples/sec   Loss 5.3981   LearningRate 0.0144   Epoch: 12   Global Step: 154110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:43,093-Speed 3010.90 samples/sec   Loss 5.3012   LearningRate 0.0144   Epoch: 12   Global Step: 154120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:46,479-Speed 3025.36 samples/sec   Loss 5.4932   LearningRate 0.0144   Epoch: 12   Global Step: 154130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:49,825-Speed 3061.33 samples/sec   Loss 5.1709   LearningRate 0.0144   Epoch: 12   Global Step: 154140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:18:53,194-Speed 3040.28 samples/sec   Loss 5.3098   LearningRate 0.0144   Epoch: 12   Global Step: 154150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:18:56,508-Speed 3090.85 samples/sec   Loss 5.2518   LearningRate 0.0144   Epoch: 12   Global Step: 154160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:19:00,023-Speed 2914.07 samples/sec   Loss 5.3225   LearningRate 0.0144   Epoch: 12   Global Step: 154170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:19:03,378-Speed 3052.73 samples/sec   Loss 5.3898   LearningRate 0.0144   Epoch: 12   Global Step: 154180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:19:06,755-Speed 3033.16 samples/sec   Loss 5.2727   LearningRate 0.0144   Epoch: 12   Global Step: 154190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:19:10,115-Speed 3048.96 samples/sec   Loss 5.2504   LearningRate 0.0144   Epoch: 12   Global Step: 154200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:19:13,469-Speed 3054.72 samples/sec   Loss 5.3158   LearningRate 0.0144   Epoch: 12   Global Step: 154210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:19:16,912-Speed 2974.06 samples/sec   Loss 5.3326   LearningRate 0.0144   Epoch: 12   Global Step: 154220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:19:20,343-Speed 2985.76 samples/sec   Loss 5.3020   LearningRate 0.0144   Epoch: 12   Global Step: 154230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:19:23,686-Speed 3064.36 samples/sec   Loss 5.3701   LearningRate 0.0144   Epoch: 12   Global Step: 154240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:19:27,096-Speed 3003.17 samples/sec   Loss 5.2765   LearningRate 0.0144   Epoch: 12   Global Step: 154250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:19:30,548-Speed 2967.81 samples/sec   Loss 5.2996   LearningRate 0.0144   Epoch: 12   Global Step: 154260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:19:33,949-Speed 3011.18 samples/sec   Loss 5.3473   LearningRate 0.0144   Epoch: 12   Global Step: 154270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:19:37,356-Speed 3007.12 samples/sec   Loss 5.3114   LearningRate 0.0144   Epoch: 12   Global Step: 154280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:19:40,775-Speed 2996.07 samples/sec   Loss 5.2990   LearningRate 0.0144   Epoch: 12   Global Step: 154290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:19:44,135-Speed 3048.11 samples/sec   Loss 5.4063   LearningRate 0.0144   Epoch: 12   Global Step: 154300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:19:47,502-Speed 3041.51 samples/sec   Loss 5.3314   LearningRate 0.0144   Epoch: 12   Global Step: 154310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:19:50,896-Speed 3029.36 samples/sec   Loss 5.2962   LearningRate 0.0143   Epoch: 12   Global Step: 154320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:19:54,821-Speed 2609.22 samples/sec   Loss 5.3332   LearningRate 0.0143   Epoch: 12   Global Step: 154330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:19:58,801-Speed 2573.27 samples/sec   Loss 5.3503   LearningRate 0.0143   Epoch: 12   Global Step: 154340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:20:02,194-Speed 3019.30 samples/sec   Loss 5.4159   LearningRate 0.0143   Epoch: 12   Global Step: 154350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:20:05,620-Speed 2989.06 samples/sec   Loss 5.3833   LearningRate 0.0143   Epoch: 12   Global Step: 154360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:20:09,009-Speed 3022.79 samples/sec   Loss 5.2345   LearningRate 0.0143   Epoch: 12   Global Step: 154370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:20:12,440-Speed 2985.52 samples/sec   Loss 5.4274   LearningRate 0.0143   Epoch: 12   Global Step: 154380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:20:15,822-Speed 3028.76 samples/sec   Loss 5.2538   LearningRate 0.0143   Epoch: 12   Global Step: 154390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:20:19,268-Speed 2972.05 samples/sec   Loss 5.3406   LearningRate 0.0143   Epoch: 12   Global Step: 154400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:20:22,743-Speed 2947.81 samples/sec   Loss 5.2610   LearningRate 0.0143   Epoch: 12   Global Step: 154410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:20:26,135-Speed 3020.25 samples/sec   Loss 5.3898   LearningRate 0.0143   Epoch: 12   Global Step: 154420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:20:29,493-Speed 3049.71 samples/sec   Loss 5.3262   LearningRate 0.0143   Epoch: 12   Global Step: 154430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:20:32,857-Speed 3045.16 samples/sec   Loss 5.2970   LearningRate 0.0143   Epoch: 12   Global Step: 154440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:20:36,185-Speed 3077.27 samples/sec   Loss 5.3027   LearningRate 0.0143   Epoch: 12   Global Step: 154450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:20:39,620-Speed 2982.53 samples/sec   Loss 5.3216   LearningRate 0.0143   Epoch: 12   Global Step: 154460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:20:42,989-Speed 3039.79 samples/sec   Loss 5.3991   LearningRate 0.0143   Epoch: 12   Global Step: 154470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:20:46,450-Speed 2959.88 samples/sec   Loss 5.3022   LearningRate 0.0143   Epoch: 12   Global Step: 154480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:20:49,821-Speed 3038.17 samples/sec   Loss 5.2838   LearningRate 0.0143   Epoch: 12   Global Step: 154490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:20:53,293-Speed 2950.17 samples/sec   Loss 5.2779   LearningRate 0.0143   Epoch: 12   Global Step: 154500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:20:56,697-Speed 3009.50 samples/sec   Loss 5.3351   LearningRate 0.0143   Epoch: 12   Global Step: 154510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:21:00,132-Speed 2981.87 samples/sec   Loss 5.3883   LearningRate 0.0143   Epoch: 12   Global Step: 154520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:21:03,482-Speed 3057.97 samples/sec   Loss 5.2672   LearningRate 0.0143   Epoch: 12   Global Step: 154530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:21:06,953-Speed 2950.34 samples/sec   Loss 5.2989   LearningRate 0.0143   Epoch: 12   Global Step: 154540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:21:10,401-Speed 2971.17 samples/sec   Loss 5.3124   LearningRate 0.0143   Epoch: 12   Global Step: 154550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:21:13,747-Speed 3061.50 samples/sec   Loss 5.3899   LearningRate 0.0143   Epoch: 12   Global Step: 154560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:21:17,242-Speed 2929.92 samples/sec   Loss 5.3194   LearningRate 0.0143   Epoch: 12   Global Step: 154570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:21:20,625-Speed 3028.94 samples/sec   Loss 5.4106   LearningRate 0.0143   Epoch: 12   Global Step: 154580   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:21:24,036-Speed 3003.62 samples/sec   Loss 5.3496   LearningRate 0.0143   Epoch: 12   Global Step: 154590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:21:27,409-Speed 3036.23 samples/sec   Loss 5.3982   LearningRate 0.0143   Epoch: 12   Global Step: 154600   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:21:30,918-Speed 2919.66 samples/sec   Loss 5.2360   LearningRate 0.0143   Epoch: 12   Global Step: 154610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:21:34,267-Speed 3057.73 samples/sec   Loss 5.4225   LearningRate 0.0143   Epoch: 12   Global Step: 154620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:21:37,672-Speed 3008.84 samples/sec   Loss 5.3309   LearningRate 0.0143   Epoch: 12   Global Step: 154630   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:21:40,986-Speed 3090.68 samples/sec   Loss 5.3845   LearningRate 0.0143   Epoch: 12   Global Step: 154640   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:21:44,337-Speed 3056.16 samples/sec   Loss 5.2442   LearningRate 0.0142   Epoch: 12   Global Step: 154650   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:21:47,723-Speed 3025.41 samples/sec   Loss 5.3426   LearningRate 0.0142   Epoch: 12   Global Step: 154660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:21:51,066-Speed 3064.30 samples/sec   Loss 5.3214   LearningRate 0.0142   Epoch: 12   Global Step: 154670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:21:54,446-Speed 3030.33 samples/sec   Loss 5.2811   LearningRate 0.0142   Epoch: 12   Global Step: 154680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:21:57,905-Speed 2961.47 samples/sec   Loss 5.3645   LearningRate 0.0142   Epoch: 12   Global Step: 154690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:22:01,208-Speed 3100.49 samples/sec   Loss 5.3945   LearningRate 0.0142   Epoch: 12   Global Step: 154700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:22:04,616-Speed 3005.45 samples/sec   Loss 5.3617   LearningRate 0.0142   Epoch: 12   Global Step: 154710   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:22:07,973-Speed 3052.02 samples/sec   Loss 5.3047   LearningRate 0.0142   Epoch: 12   Global Step: 154720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:22:11,444-Speed 2950.86 samples/sec   Loss 5.3825   LearningRate 0.0142   Epoch: 12   Global Step: 154730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:22:14,844-Speed 3012.35 samples/sec   Loss 5.2877   LearningRate 0.0142   Epoch: 12   Global Step: 154740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:22:18,272-Speed 2988.26 samples/sec   Loss 5.3206   LearningRate 0.0142   Epoch: 12   Global Step: 154750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:22:21,691-Speed 2996.51 samples/sec   Loss 5.5169   LearningRate 0.0142   Epoch: 12   Global Step: 154760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:22:25,005-Speed 3090.57 samples/sec   Loss 5.2699   LearningRate 0.0142   Epoch: 12   Global Step: 154770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:22:28,337-Speed 3073.72 samples/sec   Loss 5.4102   LearningRate 0.0142   Epoch: 12   Global Step: 154780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:22:31,755-Speed 2996.94 samples/sec   Loss 5.3194   LearningRate 0.0142   Epoch: 12   Global Step: 154790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:22:35,185-Speed 2985.74 samples/sec   Loss 5.2863   LearningRate 0.0142   Epoch: 12   Global Step: 154800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:22:38,572-Speed 3025.78 samples/sec   Loss 5.3072   LearningRate 0.0142   Epoch: 12   Global Step: 154810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:22:41,914-Speed 3064.78 samples/sec   Loss 5.3810   LearningRate 0.0142   Epoch: 12   Global Step: 154820   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:22:45,333-Speed 2995.71 samples/sec   Loss 5.3211   LearningRate 0.0142   Epoch: 12   Global Step: 154830   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:22:48,812-Speed 2944.14 samples/sec   Loss 5.2711   LearningRate 0.0142   Epoch: 12   Global Step: 154840   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:22:52,199-Speed 3023.94 samples/sec   Loss 5.3646   LearningRate 0.0142   Epoch: 12   Global Step: 154850   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:22:55,621-Speed 2993.35 samples/sec   Loss 5.3596   LearningRate 0.0142   Epoch: 12   Global Step: 154860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:22:58,990-Speed 3040.87 samples/sec   Loss 5.3673   LearningRate 0.0142   Epoch: 12   Global Step: 154870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:23:02,478-Speed 2936.19 samples/sec   Loss 5.3682   LearningRate 0.0142   Epoch: 12   Global Step: 154880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:23:05,814-Speed 3070.31 samples/sec   Loss 5.3636   LearningRate 0.0142   Epoch: 12   Global Step: 154890   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:23:09,197-Speed 3028.07 samples/sec   Loss 5.3917   LearningRate 0.0142   Epoch: 12   Global Step: 154900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:23:12,550-Speed 3054.94 samples/sec   Loss 5.3308   LearningRate 0.0142   Epoch: 12   Global Step: 154910   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:23:15,937-Speed 3024.26 samples/sec   Loss 5.4013   LearningRate 0.0142   Epoch: 12   Global Step: 154920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:23:19,362-Speed 2990.26 samples/sec   Loss 5.2649   LearningRate 0.0142   Epoch: 12   Global Step: 154930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:23:22,824-Speed 2959.18 samples/sec   Loss 5.3278   LearningRate 0.0142   Epoch: 12   Global Step: 154940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:23:26,182-Speed 3050.10 samples/sec   Loss 5.3402   LearningRate 0.0142   Epoch: 12   Global Step: 154950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:23:29,542-Speed 3048.18 samples/sec   Loss 5.5277   LearningRate 0.0142   Epoch: 12   Global Step: 154960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:23:32,884-Speed 3065.07 samples/sec   Loss 5.4134   LearningRate 0.0142   Epoch: 12   Global Step: 154970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:23:36,269-Speed 3026.03 samples/sec   Loss 5.3346   LearningRate 0.0141   Epoch: 12   Global Step: 154980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:23:39,699-Speed 2986.88 samples/sec   Loss 5.5005   LearningRate 0.0141   Epoch: 12   Global Step: 154990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:23:43,131-Speed 2983.64 samples/sec   Loss 5.3961   LearningRate 0.0141   Epoch: 12   Global Step: 155000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:23:46,619-Speed 2936.65 samples/sec   Loss 5.3751   LearningRate 0.0141   Epoch: 12   Global Step: 155010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:23:50,017-Speed 3014.77 samples/sec   Loss 5.4412   LearningRate 0.0141   Epoch: 12   Global Step: 155020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:23:53,406-Speed 3022.78 samples/sec   Loss 5.3175   LearningRate 0.0141   Epoch: 12   Global Step: 155030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:23:56,869-Speed 2957.85 samples/sec   Loss 5.3353   LearningRate 0.0141   Epoch: 12   Global Step: 155040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:24:00,267-Speed 3014.06 samples/sec   Loss 5.3034   LearningRate 0.0141   Epoch: 12   Global Step: 155050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:24:03,614-Speed 3060.06 samples/sec   Loss 5.3101   LearningRate 0.0141   Epoch: 12   Global Step: 155060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:24:07,050-Speed 2981.49 samples/sec   Loss 5.4769   LearningRate 0.0141   Epoch: 12   Global Step: 155070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:24:10,518-Speed 2953.60 samples/sec   Loss 5.3108   LearningRate 0.0141   Epoch: 12   Global Step: 155080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:24:13,920-Speed 3011.19 samples/sec   Loss 5.3534   LearningRate 0.0141   Epoch: 12   Global Step: 155090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:24:17,365-Speed 2972.69 samples/sec   Loss 5.3085   LearningRate 0.0141   Epoch: 12   Global Step: 155100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:24:20,752-Speed 3024.18 samples/sec   Loss 5.3531   LearningRate 0.0141   Epoch: 12   Global Step: 155110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:24:24,141-Speed 3023.25 samples/sec   Loss 5.3538   LearningRate 0.0141   Epoch: 12   Global Step: 155120   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:24:27,539-Speed 3014.38 samples/sec   Loss 5.3906   LearningRate 0.0141   Epoch: 12   Global Step: 155130   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:24:30,934-Speed 3016.23 samples/sec   Loss 5.4551   LearningRate 0.0141   Epoch: 12   Global Step: 155140   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:24:34,316-Speed 3029.49 samples/sec   Loss 5.3472   LearningRate 0.0141   Epoch: 12   Global Step: 155150   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:24:37,677-Speed 3047.00 samples/sec   Loss 5.2283   LearningRate 0.0141   Epoch: 12   Global Step: 155160   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:24:41,088-Speed 3003.08 samples/sec   Loss 5.4187   LearningRate 0.0141   Epoch: 12   Global Step: 155170   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:24:44,546-Speed 2962.16 samples/sec   Loss 5.4543   LearningRate 0.0141   Epoch: 12   Global Step: 155180   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:24:47,883-Speed 3069.27 samples/sec   Loss 5.2164   LearningRate 0.0141   Epoch: 12   Global Step: 155190   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:24:51,233-Speed 3057.53 samples/sec   Loss 5.3880   LearningRate 0.0141   Epoch: 12   Global Step: 155200   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:24:54,603-Speed 3039.03 samples/sec   Loss 5.3150   LearningRate 0.0141   Epoch: 12   Global Step: 155210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:24:57,917-Speed 3091.27 samples/sec   Loss 5.3811   LearningRate 0.0141   Epoch: 12   Global Step: 155220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:25:01,295-Speed 3032.58 samples/sec   Loss 5.3134   LearningRate 0.0141   Epoch: 12   Global Step: 155230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:25:04,672-Speed 3032.74 samples/sec   Loss 5.3654   LearningRate 0.0141   Epoch: 12   Global Step: 155240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:25:08,121-Speed 2969.49 samples/sec   Loss 5.4089   LearningRate 0.0141   Epoch: 12   Global Step: 155250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:25:11,447-Speed 3079.87 samples/sec   Loss 5.2540   LearningRate 0.0141   Epoch: 12   Global Step: 155260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:25:14,793-Speed 3061.59 samples/sec   Loss 5.4298   LearningRate 0.0141   Epoch: 12   Global Step: 155270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:25:18,143-Speed 3057.49 samples/sec   Loss 5.3630   LearningRate 0.0141   Epoch: 12   Global Step: 155280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:25:21,580-Speed 2980.09 samples/sec   Loss 5.4243   LearningRate 0.0141   Epoch: 12   Global Step: 155290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:25:25,029-Speed 2969.66 samples/sec   Loss 5.3035   LearningRate 0.0141   Epoch: 12   Global Step: 155300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:25:28,483-Speed 2966.02 samples/sec   Loss 5.4893   LearningRate 0.0140   Epoch: 12   Global Step: 155310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:25:31,957-Speed 2947.86 samples/sec   Loss 5.3669   LearningRate 0.0140   Epoch: 12   Global Step: 155320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:25:35,422-Speed 2956.13 samples/sec   Loss 5.3869   LearningRate 0.0140   Epoch: 12   Global Step: 155330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:25:38,870-Speed 2970.74 samples/sec   Loss 5.3790   LearningRate 0.0140   Epoch: 12   Global Step: 155340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:25:42,228-Speed 3051.12 samples/sec   Loss 5.3949   LearningRate 0.0140   Epoch: 12   Global Step: 155350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:25:45,545-Speed 3088.11 samples/sec   Loss 5.3372   LearningRate 0.0140   Epoch: 12   Global Step: 155360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:25:48,979-Speed 2982.27 samples/sec   Loss 5.3543   LearningRate 0.0140   Epoch: 12   Global Step: 155370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:25:52,363-Speed 3027.30 samples/sec   Loss 5.3433   LearningRate 0.0140   Epoch: 12   Global Step: 155380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:25:55,738-Speed 3034.82 samples/sec   Loss 5.3694   LearningRate 0.0140   Epoch: 12   Global Step: 155390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:25:59,055-Speed 3087.74 samples/sec   Loss 5.3536   LearningRate 0.0140   Epoch: 12   Global Step: 155400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:26:02,402-Speed 3060.30 samples/sec   Loss 5.3802   LearningRate 0.0140   Epoch: 12   Global Step: 155410   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:26:05,796-Speed 3017.63 samples/sec   Loss 5.3941   LearningRate 0.0140   Epoch: 12   Global Step: 155420   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:26:09,114-Speed 3087.41 samples/sec   Loss 5.2434   LearningRate 0.0140   Epoch: 12   Global Step: 155430   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:26:12,479-Speed 3044.30 samples/sec   Loss 5.2343   LearningRate 0.0140   Epoch: 12   Global Step: 155440   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:26:15,870-Speed 3020.26 samples/sec   Loss 5.4252   LearningRate 0.0140   Epoch: 12   Global Step: 155450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:26:19,252-Speed 3029.19 samples/sec   Loss 5.3137   LearningRate 0.0140   Epoch: 12   Global Step: 155460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:26:22,644-Speed 3019.59 samples/sec   Loss 5.3633   LearningRate 0.0140   Epoch: 12   Global Step: 155470   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:26:26,055-Speed 3003.13 samples/sec   Loss 5.3146   LearningRate 0.0140   Epoch: 12   Global Step: 155480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:26:29,418-Speed 3045.20 samples/sec   Loss 5.4246   LearningRate 0.0140   Epoch: 12   Global Step: 155490   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:26:32,855-Speed 2980.26 samples/sec   Loss 5.3870   LearningRate 0.0140   Epoch: 12   Global Step: 155500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:26:36,238-Speed 3027.60 samples/sec   Loss 5.3604   LearningRate 0.0140   Epoch: 12   Global Step: 155510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:26:39,617-Speed 3031.35 samples/sec   Loss 5.3967   LearningRate 0.0140   Epoch: 12   Global Step: 155520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:26:42,989-Speed 3037.96 samples/sec   Loss 5.3120   LearningRate 0.0140   Epoch: 12   Global Step: 155530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:26:46,367-Speed 3031.67 samples/sec   Loss 5.4145   LearningRate 0.0140   Epoch: 12   Global Step: 155540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:26:49,751-Speed 3027.47 samples/sec   Loss 5.4860   LearningRate 0.0140   Epoch: 12   Global Step: 155550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:26:53,175-Speed 2991.38 samples/sec   Loss 5.2945   LearningRate 0.0140   Epoch: 12   Global Step: 155560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:26:56,586-Speed 3003.00 samples/sec   Loss 5.3710   LearningRate 0.0140   Epoch: 12   Global Step: 155570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:27:00,100-Speed 2915.38 samples/sec   Loss 5.3399   LearningRate 0.0140   Epoch: 12   Global Step: 155580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:27:03,479-Speed 3031.33 samples/sec   Loss 5.3261   LearningRate 0.0140   Epoch: 12   Global Step: 155590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:27:06,829-Speed 3056.69 samples/sec   Loss 5.3803   LearningRate 0.0140   Epoch: 12   Global Step: 155600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:27:10,178-Speed 3059.15 samples/sec   Loss 5.5025   LearningRate 0.0140   Epoch: 12   Global Step: 155610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:27:13,496-Speed 3087.02 samples/sec   Loss 5.2970   LearningRate 0.0140   Epoch: 12   Global Step: 155620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:27:16,895-Speed 3013.48 samples/sec   Loss 5.3028   LearningRate 0.0140   Epoch: 12   Global Step: 155630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:27:20,319-Speed 2991.54 samples/sec   Loss 5.3289   LearningRate 0.0139   Epoch: 12   Global Step: 155640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:27:23,668-Speed 3059.26 samples/sec   Loss 5.4626   LearningRate 0.0139   Epoch: 12   Global Step: 155650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:27:27,069-Speed 3011.43 samples/sec   Loss 5.3784   LearningRate 0.0139   Epoch: 12   Global Step: 155660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:27:30,552-Speed 2941.00 samples/sec   Loss 5.5559   LearningRate 0.0139   Epoch: 12   Global Step: 155670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:27:33,882-Speed 3076.09 samples/sec   Loss 5.3448   LearningRate 0.0139   Epoch: 12   Global Step: 155680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:27:37,337-Speed 2964.15 samples/sec   Loss 5.4247   LearningRate 0.0139   Epoch: 12   Global Step: 155690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:27:40,751-Speed 3000.71 samples/sec   Loss 5.3566   LearningRate 0.0139   Epoch: 12   Global Step: 155700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:27:44,146-Speed 3016.84 samples/sec   Loss 5.4254   LearningRate 0.0139   Epoch: 12   Global Step: 155710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:27:47,535-Speed 3022.08 samples/sec   Loss 5.2626   LearningRate 0.0139   Epoch: 12   Global Step: 155720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:27:50,986-Speed 2968.74 samples/sec   Loss 5.4037   LearningRate 0.0139   Epoch: 12   Global Step: 155730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:27:54,334-Speed 3059.38 samples/sec   Loss 5.3275   LearningRate 0.0139   Epoch: 12   Global Step: 155740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:27:57,748-Speed 3000.19 samples/sec   Loss 5.2874   LearningRate 0.0139   Epoch: 12   Global Step: 155750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:28:01,078-Speed 3075.75 samples/sec   Loss 5.2765   LearningRate 0.0139   Epoch: 12   Global Step: 155760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:04,465-Speed 3024.60 samples/sec   Loss 5.3479   LearningRate 0.0139   Epoch: 12   Global Step: 155770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:07,957-Speed 2933.11 samples/sec   Loss 5.2998   LearningRate 0.0139   Epoch: 12   Global Step: 155780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:11,420-Speed 2957.73 samples/sec   Loss 5.5248   LearningRate 0.0139   Epoch: 12   Global Step: 155790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:14,758-Speed 3069.37 samples/sec   Loss 5.3010   LearningRate 0.0139   Epoch: 12   Global Step: 155800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:18,215-Speed 2962.54 samples/sec   Loss 5.3835   LearningRate 0.0139   Epoch: 12   Global Step: 155810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:21,573-Speed 3050.21 samples/sec   Loss 5.3217   LearningRate 0.0139   Epoch: 12   Global Step: 155820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:24,992-Speed 2996.27 samples/sec   Loss 5.3351   LearningRate 0.0139   Epoch: 12   Global Step: 155830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:28,365-Speed 3036.40 samples/sec   Loss 5.4355   LearningRate 0.0139   Epoch: 12   Global Step: 155840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:31,695-Speed 3075.84 samples/sec   Loss 5.4061   LearningRate 0.0139   Epoch: 12   Global Step: 155850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:35,065-Speed 3039.77 samples/sec   Loss 5.2787   LearningRate 0.0139   Epoch: 12   Global Step: 155860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:38,507-Speed 2975.60 samples/sec   Loss 5.3236   LearningRate 0.0139   Epoch: 12   Global Step: 155870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:41,884-Speed 3033.03 samples/sec   Loss 5.3807   LearningRate 0.0139   Epoch: 12   Global Step: 155880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:45,297-Speed 3001.01 samples/sec   Loss 5.4914   LearningRate 0.0139   Epoch: 12   Global Step: 155890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:48,641-Speed 3063.37 samples/sec   Loss 5.3294   LearningRate 0.0139   Epoch: 12   Global Step: 155900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:52,035-Speed 3017.67 samples/sec   Loss 5.3100   LearningRate 0.0139   Epoch: 12   Global Step: 155910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:55,470-Speed 2982.44 samples/sec   Loss 5.3460   LearningRate 0.0139   Epoch: 12   Global Step: 155920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:28:58,910-Speed 2977.21 samples/sec   Loss 5.3350   LearningRate 0.0139   Epoch: 12   Global Step: 155930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:29:02,329-Speed 2995.76 samples/sec   Loss 5.3558   LearningRate 0.0139   Epoch: 12   Global Step: 155940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:29:05,714-Speed 3026.74 samples/sec   Loss 5.3550   LearningRate 0.0139   Epoch: 12   Global Step: 155950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:29:09,125-Speed 3002.01 samples/sec   Loss 5.4051   LearningRate 0.0139   Epoch: 12   Global Step: 155960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:29:12,510-Speed 3026.23 samples/sec   Loss 5.4572   LearningRate 0.0138   Epoch: 12   Global Step: 155970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:29:15,887-Speed 3033.17 samples/sec   Loss 5.3713   LearningRate 0.0138   Epoch: 12   Global Step: 155980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:29:19,256-Speed 3040.63 samples/sec   Loss 5.3082   LearningRate 0.0138   Epoch: 12   Global Step: 155990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:29:22,586-Speed 3076.28 samples/sec   Loss 5.3701   LearningRate 0.0138   Epoch: 12   Global Step: 156000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:29:25,930-Speed 3062.81 samples/sec   Loss 5.3229   LearningRate 0.0138   Epoch: 12   Global Step: 156010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:29:29,266-Speed 3070.48 samples/sec   Loss 5.3489   LearningRate 0.0138   Epoch: 12   Global Step: 156020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:29:32,655-Speed 3022.27 samples/sec   Loss 5.3938   LearningRate 0.0138   Epoch: 12   Global Step: 156030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:29:35,994-Speed 3067.56 samples/sec   Loss 5.4249   LearningRate 0.0138   Epoch: 12   Global Step: 156040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:29:39,373-Speed 3031.28 samples/sec   Loss 5.4596   LearningRate 0.0138   Epoch: 12   Global Step: 156050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:29:42,783-Speed 3003.65 samples/sec   Loss 5.2780   LearningRate 0.0138   Epoch: 12   Global Step: 156060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:29:46,113-Speed 3076.57 samples/sec   Loss 5.3237   LearningRate 0.0138   Epoch: 12   Global Step: 156070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:29:49,532-Speed 2996.53 samples/sec   Loss 5.3860   LearningRate 0.0138   Epoch: 12   Global Step: 156080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:29:52,859-Speed 3078.27 samples/sec   Loss 5.3534   LearningRate 0.0138   Epoch: 12   Global Step: 156090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:29:56,222-Speed 3046.11 samples/sec   Loss 5.3576   LearningRate 0.0138   Epoch: 12   Global Step: 156100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:29:59,687-Speed 2956.73 samples/sec   Loss 5.2770   LearningRate 0.0138   Epoch: 12   Global Step: 156110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:30:03,087-Speed 3012.38 samples/sec   Loss 5.3407   LearningRate 0.0138   Epoch: 12   Global Step: 156120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:30:06,476-Speed 3021.70 samples/sec   Loss 5.1777   LearningRate 0.0138   Epoch: 12   Global Step: 156130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:30:09,863-Speed 3024.38 samples/sec   Loss 5.3624   LearningRate 0.0138   Epoch: 12   Global Step: 156140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:30:13,195-Speed 3074.49 samples/sec   Loss 5.5774   LearningRate 0.0138   Epoch: 12   Global Step: 156150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:30:16,539-Speed 3063.11 samples/sec   Loss 5.3616   LearningRate 0.0138   Epoch: 12   Global Step: 156160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:30:19,892-Speed 3054.95 samples/sec   Loss 5.2898   LearningRate 0.0138   Epoch: 12   Global Step: 156170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:30:23,224-Speed 3074.47 samples/sec   Loss 5.3156   LearningRate 0.0138   Epoch: 12   Global Step: 156180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:30:26,626-Speed 3010.24 samples/sec   Loss 5.3482   LearningRate 0.0138   Epoch: 12   Global Step: 156190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:30:29,972-Speed 3061.79 samples/sec   Loss 5.3839   LearningRate 0.0138   Epoch: 12   Global Step: 156200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:30:33,368-Speed 3015.95 samples/sec   Loss 5.4359   LearningRate 0.0138   Epoch: 12   Global Step: 156210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:30:36,760-Speed 3019.48 samples/sec   Loss 5.4261   LearningRate 0.0138   Epoch: 12   Global Step: 156220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:30:40,161-Speed 3012.31 samples/sec   Loss 5.4922   LearningRate 0.0138   Epoch: 12   Global Step: 156230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:30:43,601-Speed 2977.08 samples/sec   Loss 5.4507   LearningRate 0.0138   Epoch: 12   Global Step: 156240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:30:47,054-Speed 2966.20 samples/sec   Loss 5.3159   LearningRate 0.0138   Epoch: 12   Global Step: 156250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:30:50,463-Speed 3005.31 samples/sec   Loss 5.3072   LearningRate 0.0138   Epoch: 12   Global Step: 156260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:30:53,911-Speed 2970.45 samples/sec   Loss 5.3552   LearningRate 0.0138   Epoch: 12   Global Step: 156270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:30:57,344-Speed 2983.97 samples/sec   Loss 5.4705   LearningRate 0.0138   Epoch: 12   Global Step: 156280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:31:00,687-Speed 3064.19 samples/sec   Loss 5.3771   LearningRate 0.0138   Epoch: 12   Global Step: 156290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:31:04,135-Speed 2970.89 samples/sec   Loss 5.3738   LearningRate 0.0138   Epoch: 12   Global Step: 156300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:31:07,570-Speed 2981.50 samples/sec   Loss 5.2787   LearningRate 0.0137   Epoch: 12   Global Step: 156310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:31:10,893-Speed 3083.34 samples/sec   Loss 5.3744   LearningRate 0.0137   Epoch: 12   Global Step: 156320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:31:14,258-Speed 3043.07 samples/sec   Loss 5.4017   LearningRate 0.0137   Epoch: 12   Global Step: 156330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:31:17,707-Speed 2970.26 samples/sec   Loss 5.3723   LearningRate 0.0137   Epoch: 12   Global Step: 156340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:31:21,184-Speed 2946.05 samples/sec   Loss 5.3402   LearningRate 0.0137   Epoch: 12   Global Step: 156350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:31:24,659-Speed 2947.72 samples/sec   Loss 5.4308   LearningRate 0.0137   Epoch: 12   Global Step: 156360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:31:28,064-Speed 3008.53 samples/sec   Loss 5.2872   LearningRate 0.0137   Epoch: 12   Global Step: 156370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:31:31,423-Speed 3049.25 samples/sec   Loss 5.4857   LearningRate 0.0137   Epoch: 12   Global Step: 156380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:31:34,806-Speed 3027.99 samples/sec   Loss 5.4281   LearningRate 0.0137   Epoch: 12   Global Step: 156390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:31:38,271-Speed 2956.03 samples/sec   Loss 5.4308   LearningRate 0.0137   Epoch: 12   Global Step: 156400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:31:41,680-Speed 3005.16 samples/sec   Loss 5.2234   LearningRate 0.0137   Epoch: 12   Global Step: 156410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:31:45,076-Speed 3015.70 samples/sec   Loss 5.4594   LearningRate 0.0137   Epoch: 12   Global Step: 156420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:31:48,481-Speed 3008.70 samples/sec   Loss 5.3289   LearningRate 0.0137   Epoch: 12   Global Step: 156430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:31:51,876-Speed 3017.43 samples/sec   Loss 5.3650   LearningRate 0.0137   Epoch: 12   Global Step: 156440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:31:55,318-Speed 2975.41 samples/sec   Loss 5.4065   LearningRate 0.0137   Epoch: 12   Global Step: 156450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:31:58,762-Speed 2974.76 samples/sec   Loss 5.3120   LearningRate 0.0137   Epoch: 12   Global Step: 156460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:02,110-Speed 3059.31 samples/sec   Loss 5.2990   LearningRate 0.0137   Epoch: 12   Global Step: 156470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:05,620-Speed 2918.18 samples/sec   Loss 5.2811   LearningRate 0.0137   Epoch: 12   Global Step: 156480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:08,972-Speed 3055.47 samples/sec   Loss 5.3347   LearningRate 0.0137   Epoch: 12   Global Step: 156490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:12,404-Speed 2984.46 samples/sec   Loss 5.4088   LearningRate 0.0137   Epoch: 12   Global Step: 156500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:15,855-Speed 2968.69 samples/sec   Loss 5.4311   LearningRate 0.0137   Epoch: 12   Global Step: 156510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:19,238-Speed 3027.15 samples/sec   Loss 5.3337   LearningRate 0.0137   Epoch: 12   Global Step: 156520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:32:22,620-Speed 3028.79 samples/sec   Loss 5.3756   LearningRate 0.0137   Epoch: 12   Global Step: 156530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:32:26,026-Speed 3007.95 samples/sec   Loss 5.3146   LearningRate 0.0137   Epoch: 12   Global Step: 156540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:32:29,353-Speed 3079.20 samples/sec   Loss 5.3952   LearningRate 0.0137   Epoch: 12   Global Step: 156550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:32,709-Speed 3051.65 samples/sec   Loss 5.3761   LearningRate 0.0137   Epoch: 12   Global Step: 156560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:36,057-Speed 3059.66 samples/sec   Loss 5.3574   LearningRate 0.0137   Epoch: 12   Global Step: 156570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:39,423-Speed 3043.37 samples/sec   Loss 5.3629   LearningRate 0.0137   Epoch: 12   Global Step: 156580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:42,877-Speed 2964.86 samples/sec   Loss 5.3011   LearningRate 0.0137   Epoch: 12   Global Step: 156590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:46,346-Speed 2953.09 samples/sec   Loss 5.4444   LearningRate 0.0137   Epoch: 12   Global Step: 156600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:49,755-Speed 3004.04 samples/sec   Loss 5.3801   LearningRate 0.0137   Epoch: 12   Global Step: 156610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:53,163-Speed 3006.09 samples/sec   Loss 5.3837   LearningRate 0.0137   Epoch: 12   Global Step: 156620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:56,525-Speed 3046.53 samples/sec   Loss 5.4512   LearningRate 0.0137   Epoch: 12   Global Step: 156630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:32:59,917-Speed 3019.92 samples/sec   Loss 5.3192   LearningRate 0.0136   Epoch: 12   Global Step: 156640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:33:03,325-Speed 3005.08 samples/sec   Loss 5.3559   LearningRate 0.0136   Epoch: 12   Global Step: 156650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:33:06,755-Speed 2985.97 samples/sec   Loss 5.4731   LearningRate 0.0136   Epoch: 12   Global Step: 156660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:33:10,153-Speed 3015.03 samples/sec   Loss 5.4313   LearningRate 0.0136   Epoch: 12   Global Step: 156670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:33:13,512-Speed 3048.58 samples/sec   Loss 5.3286   LearningRate 0.0136   Epoch: 12   Global Step: 156680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:33:16,883-Speed 3038.71 samples/sec   Loss 5.4696   LearningRate 0.0136   Epoch: 12   Global Step: 156690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:33:20,272-Speed 3022.62 samples/sec   Loss 5.3974   LearningRate 0.0136   Epoch: 12   Global Step: 156700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:33:23,657-Speed 3026.01 samples/sec   Loss 5.2890   LearningRate 0.0136   Epoch: 12   Global Step: 156710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:33:26,968-Speed 3093.07 samples/sec   Loss 5.3504   LearningRate 0.0136   Epoch: 12   Global Step: 156720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:33:30,368-Speed 3012.79 samples/sec   Loss 5.2240   LearningRate 0.0136   Epoch: 12   Global Step: 156730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:33:33,718-Speed 3057.13 samples/sec   Loss 5.3833   LearningRate 0.0136   Epoch: 12   Global Step: 156740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:33:37,116-Speed 3015.04 samples/sec   Loss 5.4316   LearningRate 0.0136   Epoch: 12   Global Step: 156750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:33:40,512-Speed 3016.41 samples/sec   Loss 5.3944   LearningRate 0.0136   Epoch: 12   Global Step: 156760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:33:43,952-Speed 2976.65 samples/sec   Loss 5.4893   LearningRate 0.0136   Epoch: 12   Global Step: 156770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:33:47,457-Speed 2922.64 samples/sec   Loss 5.4573   LearningRate 0.0136   Epoch: 12   Global Step: 156780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:33:50,846-Speed 3022.65 samples/sec   Loss 5.4207   LearningRate 0.0136   Epoch: 12   Global Step: 156790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:33:54,253-Speed 3006.81 samples/sec   Loss 5.2862   LearningRate 0.0136   Epoch: 12   Global Step: 156800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:33:57,711-Speed 2961.54 samples/sec   Loss 5.2999   LearningRate 0.0136   Epoch: 12   Global Step: 156810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:34:01,142-Speed 2985.61 samples/sec   Loss 5.3855   LearningRate 0.0136   Epoch: 12   Global Step: 156820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:34:04,520-Speed 3031.98 samples/sec   Loss 5.3476   LearningRate 0.0136   Epoch: 12   Global Step: 156830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:34:07,888-Speed 3041.39 samples/sec   Loss 5.3263   LearningRate 0.0136   Epoch: 12   Global Step: 156840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:34:11,260-Speed 3037.96 samples/sec   Loss 5.4127   LearningRate 0.0136   Epoch: 12   Global Step: 156850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:34:14,650-Speed 3021.12 samples/sec   Loss 5.3339   LearningRate 0.0136   Epoch: 12   Global Step: 156860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:34:18,001-Speed 3057.14 samples/sec   Loss 5.3685   LearningRate 0.0136   Epoch: 12   Global Step: 156870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:34:21,409-Speed 3005.43 samples/sec   Loss 5.3732   LearningRate 0.0136   Epoch: 12   Global Step: 156880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:34:24,846-Speed 2979.82 samples/sec   Loss 5.4811   LearningRate 0.0136   Epoch: 12   Global Step: 156890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:34:28,245-Speed 3014.19 samples/sec   Loss 5.3409   LearningRate 0.0136   Epoch: 12   Global Step: 156900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:34:31,609-Speed 3044.90 samples/sec   Loss 5.3607   LearningRate 0.0136   Epoch: 12   Global Step: 156910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:34:34,982-Speed 3036.42 samples/sec   Loss 5.2930   LearningRate 0.0136   Epoch: 12   Global Step: 156920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:34:38,472-Speed 2934.94 samples/sec   Loss 5.3408   LearningRate 0.0136   Epoch: 12   Global Step: 156930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:34:41,876-Speed 3009.12 samples/sec   Loss 5.3861   LearningRate 0.0136   Epoch: 12   Global Step: 156940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:34:45,286-Speed 3003.47 samples/sec   Loss 5.3981   LearningRate 0.0136   Epoch: 12   Global Step: 156950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:34:48,685-Speed 3014.10 samples/sec   Loss 5.3139   LearningRate 0.0136   Epoch: 12   Global Step: 156960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:34:52,036-Speed 3056.08 samples/sec   Loss 5.4033   LearningRate 0.0136   Epoch: 12   Global Step: 156970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:34:55,540-Speed 2923.30 samples/sec   Loss 5.2675   LearningRate 0.0135   Epoch: 12   Global Step: 156980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:34:58,924-Speed 3026.79 samples/sec   Loss 5.3342   LearningRate 0.0135   Epoch: 12   Global Step: 156990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:35:02,266-Speed 3065.49 samples/sec   Loss 5.3845   LearningRate 0.0135   Epoch: 12   Global Step: 157000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:35:05,634-Speed 3041.30 samples/sec   Loss 5.3262   LearningRate 0.0135   Epoch: 12   Global Step: 157010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:35:09,033-Speed 3013.69 samples/sec   Loss 5.3986   LearningRate 0.0135   Epoch: 12   Global Step: 157020   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:35:12,429-Speed 3016.79 samples/sec   Loss 5.3609   LearningRate 0.0135   Epoch: 12   Global Step: 157030   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:35:15,840-Speed 3003.01 samples/sec   Loss 5.4691   LearningRate 0.0135   Epoch: 12   Global Step: 157040   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:35:19,253-Speed 3000.68 samples/sec   Loss 5.3379   LearningRate 0.0135   Epoch: 12   Global Step: 157050   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:35:22,604-Speed 3056.94 samples/sec   Loss 5.3667   LearningRate 0.0135   Epoch: 12   Global Step: 157060   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:35:25,935-Speed 3074.72 samples/sec   Loss 5.4817   LearningRate 0.0135   Epoch: 12   Global Step: 157070   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:35:29,252-Speed 3088.26 samples/sec   Loss 5.4119   LearningRate 0.0135   Epoch: 12   Global Step: 157080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:35:32,602-Speed 3057.63 samples/sec   Loss 5.4418   LearningRate 0.0135   Epoch: 12   Global Step: 157090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:35:36,032-Speed 2985.94 samples/sec   Loss 5.3613   LearningRate 0.0135   Epoch: 12   Global Step: 157100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:35:39,482-Speed 2969.72 samples/sec   Loss 5.4635   LearningRate 0.0135   Epoch: 12   Global Step: 157110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:35:42,912-Speed 2986.11 samples/sec   Loss 5.4226   LearningRate 0.0135   Epoch: 12   Global Step: 157120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:35:46,243-Speed 3077.18 samples/sec   Loss 5.3695   LearningRate 0.0135   Epoch: 12   Global Step: 157130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:35:49,625-Speed 3027.85 samples/sec   Loss 5.3821   LearningRate 0.0135   Epoch: 12   Global Step: 157140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:35:52,931-Speed 3098.44 samples/sec   Loss 5.3526   LearningRate 0.0135   Epoch: 12   Global Step: 157150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:35:56,329-Speed 3013.96 samples/sec   Loss 5.3553   LearningRate 0.0135   Epoch: 12   Global Step: 157160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:35:59,774-Speed 2973.20 samples/sec   Loss 5.3724   LearningRate 0.0135   Epoch: 12   Global Step: 157170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:36:03,281-Speed 2921.29 samples/sec   Loss 5.3229   LearningRate 0.0135   Epoch: 12   Global Step: 157180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:36:06,708-Speed 2988.29 samples/sec   Loss 5.3978   LearningRate 0.0135   Epoch: 12   Global Step: 157190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:36:10,095-Speed 3024.50 samples/sec   Loss 5.2447   LearningRate 0.0135   Epoch: 12   Global Step: 157200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:36:13,549-Speed 2966.15 samples/sec   Loss 5.4167   LearningRate 0.0135   Epoch: 12   Global Step: 157210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:36:16,927-Speed 3031.94 samples/sec   Loss 5.3706   LearningRate 0.0135   Epoch: 12   Global Step: 157220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:36:20,409-Speed 2941.96 samples/sec   Loss 5.3512   LearningRate 0.0135   Epoch: 12   Global Step: 157230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:36:23,842-Speed 2983.48 samples/sec   Loss 5.4150   LearningRate 0.0135   Epoch: 12   Global Step: 157240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:36:27,278-Speed 2980.41 samples/sec   Loss 5.4312   LearningRate 0.0135   Epoch: 12   Global Step: 157250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:36:30,645-Speed 3042.72 samples/sec   Loss 5.3077   LearningRate 0.0135   Epoch: 12   Global Step: 157260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:36:34,011-Speed 3042.54 samples/sec   Loss 5.3708   LearningRate 0.0135   Epoch: 12   Global Step: 157270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:36:37,334-Speed 3082.92 samples/sec   Loss 5.4190   LearningRate 0.0135   Epoch: 12   Global Step: 157280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:36:40,754-Speed 2994.25 samples/sec   Loss 5.3224   LearningRate 0.0135   Epoch: 12   Global Step: 157290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:36:44,116-Speed 3047.34 samples/sec   Loss 5.4160   LearningRate 0.0135   Epoch: 12   Global Step: 157300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:36:47,535-Speed 2995.26 samples/sec   Loss 5.4198   LearningRate 0.0135   Epoch: 12   Global Step: 157310   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:36:50,877-Speed 3065.39 samples/sec   Loss 5.3539   LearningRate 0.0134   Epoch: 12   Global Step: 157320   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:36:54,207-Speed 3076.24 samples/sec   Loss 5.3663   LearningRate 0.0134   Epoch: 12   Global Step: 157330   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:36:57,556-Speed 3057.81 samples/sec   Loss 5.2778   LearningRate 0.0134   Epoch: 12   Global Step: 157340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:37:00,867-Speed 3094.20 samples/sec   Loss 5.3641   LearningRate 0.0134   Epoch: 12   Global Step: 157350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:37:04,327-Speed 2959.81 samples/sec   Loss 5.4059   LearningRate 0.0134   Epoch: 12   Global Step: 157360   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:37:07,712-Speed 3026.76 samples/sec   Loss 5.4307   LearningRate 0.0134   Epoch: 12   Global Step: 157370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:37:11,160-Speed 2970.85 samples/sec   Loss 5.3385   LearningRate 0.0134   Epoch: 12   Global Step: 157380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:37:14,607-Speed 2971.39 samples/sec   Loss 5.4445   LearningRate 0.0134   Epoch: 12   Global Step: 157390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:37:18,084-Speed 2945.95 samples/sec   Loss 5.4132   LearningRate 0.0134   Epoch: 12   Global Step: 157400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:37:21,471-Speed 3024.67 samples/sec   Loss 5.3768   LearningRate 0.0134   Epoch: 12   Global Step: 157410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:37:24,861-Speed 3021.55 samples/sec   Loss 5.2778   LearningRate 0.0134   Epoch: 12   Global Step: 157420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:37:28,307-Speed 2972.55 samples/sec   Loss 5.3743   LearningRate 0.0134   Epoch: 12   Global Step: 157430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:37:31,825-Speed 2911.43 samples/sec   Loss 5.3155   LearningRate 0.0134   Epoch: 12   Global Step: 157440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:37:35,255-Speed 2986.22 samples/sec   Loss 5.3533   LearningRate 0.0134   Epoch: 12   Global Step: 157450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:37:38,741-Speed 2937.67 samples/sec   Loss 5.5072   LearningRate 0.0134   Epoch: 12   Global Step: 157460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:37:42,130-Speed 3022.45 samples/sec   Loss 5.3156   LearningRate 0.0134   Epoch: 12   Global Step: 157470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:37:45,520-Speed 3021.56 samples/sec   Loss 5.4229   LearningRate 0.0134   Epoch: 12   Global Step: 157480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:37:48,916-Speed 3015.43 samples/sec   Loss 5.4004   LearningRate 0.0134   Epoch: 12   Global Step: 157490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:37:52,331-Speed 2999.45 samples/sec   Loss 5.3579   LearningRate 0.0134   Epoch: 12   Global Step: 157500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:37:55,731-Speed 3012.70 samples/sec   Loss 5.3530   LearningRate 0.0134   Epoch: 12   Global Step: 157510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:37:59,099-Speed 3040.92 samples/sec   Loss 5.3184   LearningRate 0.0134   Epoch: 12   Global Step: 157520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:38:02,452-Speed 3055.66 samples/sec   Loss 5.3337   LearningRate 0.0134   Epoch: 12   Global Step: 157530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:38:05,800-Speed 3058.85 samples/sec   Loss 5.2714   LearningRate 0.0134   Epoch: 12   Global Step: 157540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:38:09,164-Speed 3044.67 samples/sec   Loss 5.3866   LearningRate 0.0134   Epoch: 12   Global Step: 157550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:38:12,630-Speed 2955.63 samples/sec   Loss 5.2982   LearningRate 0.0134   Epoch: 12   Global Step: 157560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:38:15,997-Speed 3043.11 samples/sec   Loss 5.3790   LearningRate 0.0134   Epoch: 12   Global Step: 157570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:38:19,363-Speed 3042.43 samples/sec   Loss 5.4263   LearningRate 0.0134   Epoch: 12   Global Step: 157580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:38:22,769-Speed 3007.78 samples/sec   Loss 5.4723   LearningRate 0.0134   Epoch: 12   Global Step: 157590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:38:26,132-Speed 3045.84 samples/sec   Loss 5.3524   LearningRate 0.0134   Epoch: 12   Global Step: 157600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:38:29,593-Speed 2959.36 samples/sec   Loss 5.3218   LearningRate 0.0134   Epoch: 12   Global Step: 157610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:38:32,973-Speed 3030.55 samples/sec   Loss 5.3827   LearningRate 0.0134   Epoch: 12   Global Step: 157620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:38:36,344-Speed 3038.55 samples/sec   Loss 5.2849   LearningRate 0.0134   Epoch: 12   Global Step: 157630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:38:39,699-Speed 3053.33 samples/sec   Loss 5.3593   LearningRate 0.0134   Epoch: 12   Global Step: 157640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:38:43,035-Speed 3070.35 samples/sec   Loss 5.3134   LearningRate 0.0134   Epoch: 12   Global Step: 157650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:38:46,406-Speed 3038.96 samples/sec   Loss 5.3468   LearningRate 0.0133   Epoch: 12   Global Step: 157660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:38:49,828-Speed 2993.69 samples/sec   Loss 5.3285   LearningRate 0.0133   Epoch: 12   Global Step: 157670   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:38:53,162-Speed 3071.76 samples/sec   Loss 5.3569   LearningRate 0.0133   Epoch: 12   Global Step: 157680   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:38:56,501-Speed 3067.48 samples/sec   Loss 5.3156   LearningRate 0.0133   Epoch: 12   Global Step: 157690   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:38:59,849-Speed 3059.36 samples/sec   Loss 5.4169   LearningRate 0.0133   Epoch: 12   Global Step: 157700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:39:03,166-Speed 3088.89 samples/sec   Loss 5.3844   LearningRate 0.0133   Epoch: 12   Global Step: 157710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:39:06,581-Speed 2998.91 samples/sec   Loss 5.3035   LearningRate 0.0133   Epoch: 12   Global Step: 157720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:39:09,928-Speed 3059.85 samples/sec   Loss 5.4197   LearningRate 0.0133   Epoch: 12   Global Step: 157730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:39:13,258-Speed 3076.52 samples/sec   Loss 5.3115   LearningRate 0.0133   Epoch: 12   Global Step: 157740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:39:16,571-Speed 3091.09 samples/sec   Loss 5.3699   LearningRate 0.0133   Epoch: 12   Global Step: 157750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:39:19,938-Speed 3042.25 samples/sec   Loss 5.4316   LearningRate 0.0133   Epoch: 12   Global Step: 157760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:39:23,270-Speed 3074.11 samples/sec   Loss 5.3407   LearningRate 0.0133   Epoch: 12   Global Step: 157770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:39:26,695-Speed 2990.69 samples/sec   Loss 5.3722   LearningRate 0.0133   Epoch: 12   Global Step: 157780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:39:30,128-Speed 2983.46 samples/sec   Loss 5.4447   LearningRate 0.0133   Epoch: 12   Global Step: 157790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:39:33,494-Speed 3043.55 samples/sec   Loss 5.3455   LearningRate 0.0133   Epoch: 12   Global Step: 157800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:39:36,877-Speed 3027.72 samples/sec   Loss 5.2174   LearningRate 0.0133   Epoch: 12   Global Step: 157810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:39:40,327-Speed 2968.76 samples/sec   Loss 5.3861   LearningRate 0.0133   Epoch: 12   Global Step: 157820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:39:43,744-Speed 2997.28 samples/sec   Loss 5.3422   LearningRate 0.0133   Epoch: 12   Global Step: 157830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:39:47,228-Speed 2940.46 samples/sec   Loss 5.2859   LearningRate 0.0133   Epoch: 12   Global Step: 157840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:39:50,585-Speed 3051.32 samples/sec   Loss 5.2802   LearningRate 0.0133   Epoch: 12   Global Step: 157850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:39:53,929-Speed 3062.69 samples/sec   Loss 5.2938   LearningRate 0.0133   Epoch: 12   Global Step: 157860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:39:57,318-Speed 3022.05 samples/sec   Loss 5.3155   LearningRate 0.0133   Epoch: 12   Global Step: 157870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:40:00,671-Speed 3054.83 samples/sec   Loss 5.4006   LearningRate 0.0133   Epoch: 12   Global Step: 157880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:40:04,069-Speed 3014.74 samples/sec   Loss 5.3266   LearningRate 0.0133   Epoch: 12   Global Step: 157890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:40:07,376-Speed 3096.84 samples/sec   Loss 5.2917   LearningRate 0.0133   Epoch: 12   Global Step: 157900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:40:10,692-Speed 3089.43 samples/sec   Loss 5.3989   LearningRate 0.0133   Epoch: 12   Global Step: 157910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:40:13,990-Speed 3105.88 samples/sec   Loss 5.3053   LearningRate 0.0133   Epoch: 12   Global Step: 157920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:40:17,389-Speed 3013.79 samples/sec   Loss 5.3001   LearningRate 0.0133   Epoch: 12   Global Step: 157930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:40:20,732-Speed 3064.24 samples/sec   Loss 5.3249   LearningRate 0.0133   Epoch: 12   Global Step: 157940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:40:24,199-Speed 2954.06 samples/sec   Loss 5.3183   LearningRate 0.0133   Epoch: 12   Global Step: 157950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:40:27,564-Speed 3044.51 samples/sec   Loss 5.3826   LearningRate 0.0133   Epoch: 12   Global Step: 157960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:40:30,946-Speed 3029.04 samples/sec   Loss 5.3085   LearningRate 0.0133   Epoch: 12   Global Step: 157970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:40:34,328-Speed 3028.21 samples/sec   Loss 5.3205   LearningRate 0.0133   Epoch: 12   Global Step: 157980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:40:37,722-Speed 3018.21 samples/sec   Loss 5.3364   LearningRate 0.0133   Epoch: 12   Global Step: 157990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:40:41,028-Speed 3098.07 samples/sec   Loss 5.3887   LearningRate 0.0132   Epoch: 12   Global Step: 158000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:40:44,358-Speed 3075.95 samples/sec   Loss 5.4714   LearningRate 0.0132   Epoch: 12   Global Step: 158010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:40:47,707-Speed 3058.13 samples/sec   Loss 5.4390   LearningRate 0.0132   Epoch: 12   Global Step: 158020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:40:51,080-Speed 3036.74 samples/sec   Loss 5.3578   LearningRate 0.0132   Epoch: 12   Global Step: 158030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:40:54,476-Speed 3015.80 samples/sec   Loss 5.2621   LearningRate 0.0132   Epoch: 12   Global Step: 158040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:40:57,808-Speed 3074.52 samples/sec   Loss 5.3965   LearningRate 0.0132   Epoch: 12   Global Step: 158050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:41:01,169-Speed 3047.72 samples/sec   Loss 5.4031   LearningRate 0.0132   Epoch: 12   Global Step: 158060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:41:04,556-Speed 3023.79 samples/sec   Loss 5.3332   LearningRate 0.0132   Epoch: 12   Global Step: 158070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:41:07,946-Speed 3022.07 samples/sec   Loss 5.2881   LearningRate 0.0132   Epoch: 12   Global Step: 158080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:41:11,312-Speed 3042.57 samples/sec   Loss 5.3132   LearningRate 0.0132   Epoch: 12   Global Step: 158090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:41:14,622-Speed 3094.90 samples/sec   Loss 5.3371   LearningRate 0.0132   Epoch: 12   Global Step: 158100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:41:17,985-Speed 3045.32 samples/sec   Loss 5.4115   LearningRate 0.0132   Epoch: 12   Global Step: 158110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:41:21,361-Speed 3033.67 samples/sec   Loss 5.2717   LearningRate 0.0132   Epoch: 12   Global Step: 158120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 16:41:24,725-Speed 3045.19 samples/sec   Loss 5.3798   LearningRate 0.0132   Epoch: 12   Global Step: 158130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 16:41:28,068-Speed 3064.42 samples/sec   Loss 5.3166   LearningRate 0.0132   Epoch: 12   Global Step: 158140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:41:31,407-Speed 3067.41 samples/sec   Loss 5.3668   LearningRate 0.0132   Epoch: 12   Global Step: 158150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:41:34,774-Speed 3041.82 samples/sec   Loss 5.4530   LearningRate 0.0132   Epoch: 12   Global Step: 158160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:41:38,201-Speed 2989.39 samples/sec   Loss 5.3611   LearningRate 0.0132   Epoch: 12   Global Step: 158170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:41:41,514-Speed 3091.28 samples/sec   Loss 5.4518   LearningRate 0.0132   Epoch: 12   Global Step: 158180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:41:44,880-Speed 3043.15 samples/sec   Loss 5.5063   LearningRate 0.0132   Epoch: 12   Global Step: 158190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:41:48,283-Speed 3010.29 samples/sec   Loss 5.3838   LearningRate 0.0132   Epoch: 12   Global Step: 158200   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:41:51,622-Speed 3067.18 samples/sec   Loss 5.2853   LearningRate 0.0132   Epoch: 12   Global Step: 158210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:41:55,040-Speed 2996.63 samples/sec   Loss 5.3302   LearningRate 0.0132   Epoch: 12   Global Step: 158220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:41:58,452-Speed 3002.59 samples/sec   Loss 5.3609   LearningRate 0.0132   Epoch: 12   Global Step: 158230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:42:01,825-Speed 3036.81 samples/sec   Loss 5.4719   LearningRate 0.0132   Epoch: 12   Global Step: 158240   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:42:05,245-Speed 2995.22 samples/sec   Loss 5.3522   LearningRate 0.0132   Epoch: 12   Global Step: 158250   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:42:08,629-Speed 3027.18 samples/sec   Loss 5.2450   LearningRate 0.0132   Epoch: 12   Global Step: 158260   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:42:12,022-Speed 3018.41 samples/sec   Loss 5.1341   LearningRate 0.0132   Epoch: 12   Global Step: 158270   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:42:15,437-Speed 2999.13 samples/sec   Loss 5.3154   LearningRate 0.0132   Epoch: 12   Global Step: 158280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:42:18,833-Speed 3016.13 samples/sec   Loss 5.2673   LearningRate 0.0132   Epoch: 12   Global Step: 158290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 16:42:22,280-Speed 2971.59 samples/sec   Loss 5.3909   LearningRate 0.0132   Epoch: 12   Global Step: 158300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:42:25,731-Speed 2967.77 samples/sec   Loss 5.4237   LearningRate 0.0132   Epoch: 12   Global Step: 158310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:42:29,116-Speed 3026.14 samples/sec   Loss 5.3144   LearningRate 0.0132   Epoch: 12   Global Step: 158320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:42:32,466-Speed 3057.96 samples/sec   Loss 5.3130   LearningRate 0.0132   Epoch: 12   Global Step: 158330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:42:35,878-Speed 3002.10 samples/sec   Loss 5.2503   LearningRate 0.0131   Epoch: 12   Global Step: 158340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:42:39,303-Speed 2989.78 samples/sec   Loss 5.4294   LearningRate 0.0131   Epoch: 12   Global Step: 158350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 16:42:42,676-Speed 3037.30 samples/sec   Loss 5.4235   LearningRate 0.0131   Epoch: 12   Global Step: 158360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:42:46,072-Speed 3016.08 samples/sec   Loss 5.3355   LearningRate 0.0131   Epoch: 12   Global Step: 158370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:42:49,482-Speed 3003.99 samples/sec   Loss 5.2999   LearningRate 0.0131   Epoch: 12   Global Step: 158380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:42:52,868-Speed 3024.97 samples/sec   Loss 5.2665   LearningRate 0.0131   Epoch: 12   Global Step: 158390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:42:56,184-Speed 3089.03 samples/sec   Loss 5.4195   LearningRate 0.0131   Epoch: 12   Global Step: 158400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:42:59,547-Speed 3046.01 samples/sec   Loss 5.4017   LearningRate 0.0131   Epoch: 12   Global Step: 158410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:43:02,918-Speed 3038.51 samples/sec   Loss 5.2725   LearningRate 0.0131   Epoch: 12   Global Step: 158420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:43:06,324-Speed 3007.18 samples/sec   Loss 5.2924   LearningRate 0.0131   Epoch: 12   Global Step: 158430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:43:09,653-Speed 3076.48 samples/sec   Loss 5.2845   LearningRate 0.0131   Epoch: 12   Global Step: 158440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:43:13,079-Speed 2990.19 samples/sec   Loss 5.3810   LearningRate 0.0131   Epoch: 12   Global Step: 158450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:43:16,401-Speed 3083.73 samples/sec   Loss 5.3936   LearningRate 0.0131   Epoch: 12   Global Step: 158460   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:43:19,750-Speed 3058.31 samples/sec   Loss 5.4251   LearningRate 0.0131   Epoch: 12   Global Step: 158470   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:43:23,225-Speed 2948.14 samples/sec   Loss 5.3470   LearningRate 0.0131   Epoch: 12   Global Step: 158480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:43:26,652-Speed 2988.30 samples/sec   Loss 5.3146   LearningRate 0.0131   Epoch: 12   Global Step: 158490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:43:30,065-Speed 3001.40 samples/sec   Loss 5.3445   LearningRate 0.0131   Epoch: 12   Global Step: 158500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:43:33,413-Speed 3060.15 samples/sec   Loss 5.3493   LearningRate 0.0131   Epoch: 12   Global Step: 158510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:43:36,740-Speed 3078.21 samples/sec   Loss 5.4217   LearningRate 0.0131   Epoch: 12   Global Step: 158520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:43:40,096-Speed 3052.75 samples/sec   Loss 5.3179   LearningRate 0.0131   Epoch: 12   Global Step: 158530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:43:43,490-Speed 3017.88 samples/sec   Loss 5.3309   LearningRate 0.0131   Epoch: 12   Global Step: 158540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:43:46,855-Speed 3043.64 samples/sec   Loss 5.3363   LearningRate 0.0131   Epoch: 12   Global Step: 158550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:43:50,305-Speed 2968.82 samples/sec   Loss 5.3963   LearningRate 0.0131   Epoch: 12   Global Step: 158560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:43:53,767-Speed 2958.39 samples/sec   Loss 5.3635   LearningRate 0.0131   Epoch: 12   Global Step: 158570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:43:57,176-Speed 3005.03 samples/sec   Loss 5.3173   LearningRate 0.0131   Epoch: 12   Global Step: 158580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:44:00,588-Speed 3002.18 samples/sec   Loss 5.4170   LearningRate 0.0131   Epoch: 12   Global Step: 158590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:44:03,991-Speed 3009.92 samples/sec   Loss 5.3062   LearningRate 0.0131   Epoch: 12   Global Step: 158600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:44:07,352-Speed 3047.39 samples/sec   Loss 5.3479   LearningRate 0.0131   Epoch: 12   Global Step: 158610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:44:10,750-Speed 3014.86 samples/sec   Loss 5.3501   LearningRate 0.0131   Epoch: 12   Global Step: 158620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:44:14,117-Speed 3042.27 samples/sec   Loss 5.3927   LearningRate 0.0131   Epoch: 12   Global Step: 158630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:44:17,447-Speed 3075.67 samples/sec   Loss 5.3264   LearningRate 0.0131   Epoch: 12   Global Step: 158640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:44:20,781-Speed 3072.23 samples/sec   Loss 5.2570   LearningRate 0.0131   Epoch: 12   Global Step: 158650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:44:24,129-Speed 3059.17 samples/sec   Loss 5.3245   LearningRate 0.0131   Epoch: 12   Global Step: 158660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:44:27,453-Speed 3081.95 samples/sec   Loss 5.4010   LearningRate 0.0131   Epoch: 12   Global Step: 158670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:44:30,783-Speed 3075.96 samples/sec   Loss 5.3607   LearningRate 0.0130   Epoch: 12   Global Step: 158680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:44:34,139-Speed 3052.12 samples/sec   Loss 5.2628   LearningRate 0.0130   Epoch: 12   Global Step: 158690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:44:37,520-Speed 3029.07 samples/sec   Loss 5.2612   LearningRate 0.0130   Epoch: 12   Global Step: 158700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:44:40,873-Speed 3054.67 samples/sec   Loss 5.2891   LearningRate 0.0130   Epoch: 12   Global Step: 158710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:44:44,219-Speed 3061.56 samples/sec   Loss 5.3134   LearningRate 0.0130   Epoch: 12   Global Step: 158720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:44:47,551-Speed 3073.77 samples/sec   Loss 5.2033   LearningRate 0.0130   Epoch: 12   Global Step: 158730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:44:50,935-Speed 3026.79 samples/sec   Loss 5.2678   LearningRate 0.0130   Epoch: 12   Global Step: 158740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:44:54,308-Speed 3037.03 samples/sec   Loss 5.3293   LearningRate 0.0130   Epoch: 12   Global Step: 158750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:44:57,679-Speed 3038.72 samples/sec   Loss 5.3228   LearningRate 0.0130   Epoch: 12   Global Step: 158760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:45:01,022-Speed 3064.16 samples/sec   Loss 5.3782   LearningRate 0.0130   Epoch: 12   Global Step: 158770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:45:04,369-Speed 3059.55 samples/sec   Loss 5.3414   LearningRate 0.0130   Epoch: 12   Global Step: 158780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:45:07,704-Speed 3071.30 samples/sec   Loss 5.2893   LearningRate 0.0130   Epoch: 12   Global Step: 158790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:45:11,080-Speed 3034.41 samples/sec   Loss 5.2746   LearningRate 0.0130   Epoch: 12   Global Step: 158800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 16:45:14,450-Speed 3039.17 samples/sec   Loss 5.3348   LearningRate 0.0130   Epoch: 12   Global Step: 158810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:45:17,861-Speed 3003.18 samples/sec   Loss 5.2939   LearningRate 0.0130   Epoch: 12   Global Step: 158820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:45:21,217-Speed 3052.16 samples/sec   Loss 5.3966   LearningRate 0.0130   Epoch: 12   Global Step: 158830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:45:24,594-Speed 3032.96 samples/sec   Loss 5.2533   LearningRate 0.0130   Epoch: 12   Global Step: 158840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:45:28,020-Speed 2989.51 samples/sec   Loss 5.3283   LearningRate 0.0130   Epoch: 12   Global Step: 158850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:45:31,409-Speed 3023.21 samples/sec   Loss 5.2418   LearningRate 0.0130   Epoch: 12   Global Step: 158860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:45:34,765-Speed 3051.87 samples/sec   Loss 5.2442   LearningRate 0.0130   Epoch: 12   Global Step: 158870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:45:38,117-Speed 3055.21 samples/sec   Loss 5.3658   LearningRate 0.0130   Epoch: 12   Global Step: 158880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:45:41,449-Speed 3074.00 samples/sec   Loss 5.3157   LearningRate 0.0130   Epoch: 12   Global Step: 158890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:45:44,805-Speed 3052.42 samples/sec   Loss 5.4485   LearningRate 0.0130   Epoch: 12   Global Step: 158900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:45:48,233-Speed 2988.60 samples/sec   Loss 5.1363   LearningRate 0.0130   Epoch: 12   Global Step: 158910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:45:51,733-Speed 2925.76 samples/sec   Loss 5.3728   LearningRate 0.0130   Epoch: 12   Global Step: 158920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:45:55,079-Speed 3061.64 samples/sec   Loss 5.3284   LearningRate 0.0130   Epoch: 12   Global Step: 158930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:45:58,448-Speed 3040.32 samples/sec   Loss 5.3435   LearningRate 0.0130   Epoch: 12   Global Step: 158940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:46:01,833-Speed 3027.04 samples/sec   Loss 5.2959   LearningRate 0.0130   Epoch: 12   Global Step: 158950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:46:05,244-Speed 3002.44 samples/sec   Loss 5.4041   LearningRate 0.0130   Epoch: 12   Global Step: 158960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:46:08,595-Speed 3057.03 samples/sec   Loss 5.2753   LearningRate 0.0130   Epoch: 12   Global Step: 158970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:46:11,959-Speed 3044.60 samples/sec   Loss 5.3071   LearningRate 0.0130   Epoch: 12   Global Step: 158980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:46:15,356-Speed 3016.10 samples/sec   Loss 5.2995   LearningRate 0.0130   Epoch: 12   Global Step: 158990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:46:18,697-Speed 3065.54 samples/sec   Loss 5.3621   LearningRate 0.0130   Epoch: 12   Global Step: 159000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:46:22,047-Speed 3057.62 samples/sec   Loss 5.3599   LearningRate 0.0130   Epoch: 12   Global Step: 159010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:46:25,421-Speed 3035.93 samples/sec   Loss 5.2790   LearningRate 0.0130   Epoch: 12   Global Step: 159020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:46:28,844-Speed 2992.35 samples/sec   Loss 5.2553   LearningRate 0.0129   Epoch: 12   Global Step: 159030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:46:32,173-Speed 3076.07 samples/sec   Loss 5.3896   LearningRate 0.0129   Epoch: 12   Global Step: 159040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:46:35,501-Speed 3077.91 samples/sec   Loss 5.3030   LearningRate 0.0129   Epoch: 12   Global Step: 159050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:46:38,878-Speed 3033.54 samples/sec   Loss 5.3882   LearningRate 0.0129   Epoch: 12   Global Step: 159060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:46:42,236-Speed 3050.39 samples/sec   Loss 5.2840   LearningRate 0.0129   Epoch: 12   Global Step: 159070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:46:45,609-Speed 3036.76 samples/sec   Loss 5.3894   LearningRate 0.0129   Epoch: 12   Global Step: 159080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:46:49,003-Speed 3018.20 samples/sec   Loss 5.2752   LearningRate 0.0129   Epoch: 12   Global Step: 159090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:46:52,426-Speed 2991.79 samples/sec   Loss 5.3726   LearningRate 0.0129   Epoch: 12   Global Step: 159100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:46:55,859-Speed 2983.58 samples/sec   Loss 5.3393   LearningRate 0.0129   Epoch: 12   Global Step: 159110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:46:59,275-Speed 2999.09 samples/sec   Loss 5.3995   LearningRate 0.0129   Epoch: 12   Global Step: 159120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:47:02,725-Speed 2969.04 samples/sec   Loss 5.4260   LearningRate 0.0129   Epoch: 12   Global Step: 159130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:47:06,131-Speed 3006.56 samples/sec   Loss 5.3328   LearningRate 0.0129   Epoch: 12   Global Step: 159140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:47:09,550-Speed 2996.52 samples/sec   Loss 5.2560   LearningRate 0.0129   Epoch: 12   Global Step: 159150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:47:12,900-Speed 3057.51 samples/sec   Loss 5.3790   LearningRate 0.0129   Epoch: 12   Global Step: 159160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:47:16,282-Speed 3028.28 samples/sec   Loss 5.2858   LearningRate 0.0129   Epoch: 12   Global Step: 159170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:47:19,717-Speed 2981.79 samples/sec   Loss 5.2265   LearningRate 0.0129   Epoch: 12   Global Step: 159180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:47:23,161-Speed 2974.14 samples/sec   Loss 5.3012   LearningRate 0.0129   Epoch: 12   Global Step: 159190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:47:26,574-Speed 3001.16 samples/sec   Loss 5.2143   LearningRate 0.0129   Epoch: 12   Global Step: 159200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:47:29,945-Speed 3038.55 samples/sec   Loss 5.2556   LearningRate 0.0129   Epoch: 12   Global Step: 159210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:47:33,279-Speed 3072.34 samples/sec   Loss 5.3436   LearningRate 0.0129   Epoch: 12   Global Step: 159220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:47:36,720-Speed 2976.38 samples/sec   Loss 5.2586   LearningRate 0.0129   Epoch: 12   Global Step: 159230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:47:40,143-Speed 2992.25 samples/sec   Loss 5.2739   LearningRate 0.0129   Epoch: 12   Global Step: 159240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:47:43,537-Speed 3018.23 samples/sec   Loss 5.3831   LearningRate 0.0129   Epoch: 12   Global Step: 159250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:47:46,915-Speed 3032.34 samples/sec   Loss 5.2754   LearningRate 0.0129   Epoch: 12   Global Step: 159260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:47:50,413-Speed 2927.96 samples/sec   Loss 5.2681   LearningRate 0.0129   Epoch: 12   Global Step: 159270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:47:53,829-Speed 2998.78 samples/sec   Loss 5.2479   LearningRate 0.0129   Epoch: 12   Global Step: 159280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:47:57,157-Speed 3078.02 samples/sec   Loss 5.3280   LearningRate 0.0129   Epoch: 12   Global Step: 159290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:48:00,551-Speed 3018.19 samples/sec   Loss 5.3287   LearningRate 0.0129   Epoch: 12   Global Step: 159300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:48:04,021-Speed 2951.86 samples/sec   Loss 5.2642   LearningRate 0.0129   Epoch: 12   Global Step: 159310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:48:07,377-Speed 3052.31 samples/sec   Loss 5.1949   LearningRate 0.0129   Epoch: 12   Global Step: 159320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:48:10,836-Speed 2960.40 samples/sec   Loss 5.3685   LearningRate 0.0129   Epoch: 12   Global Step: 159330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:48:14,243-Speed 3006.98 samples/sec   Loss 5.2950   LearningRate 0.0129   Epoch: 12   Global Step: 159340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:48:17,722-Speed 2943.56 samples/sec   Loss 5.3772   LearningRate 0.0129   Epoch: 12   Global Step: 159350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:48:21,142-Speed 2994.99 samples/sec   Loss 5.2943   LearningRate 0.0129   Epoch: 12   Global Step: 159360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:48:24,552-Speed 3003.71 samples/sec   Loss 5.4783   LearningRate 0.0128   Epoch: 12   Global Step: 159370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:48:28,015-Speed 2958.45 samples/sec   Loss 5.3242   LearningRate 0.0128   Epoch: 12   Global Step: 159380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:48:31,426-Speed 3002.57 samples/sec   Loss 5.3021   LearningRate 0.0128   Epoch: 12   Global Step: 159390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:48:34,861-Speed 2982.09 samples/sec   Loss 5.3285   LearningRate 0.0128   Epoch: 12   Global Step: 159400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:48:38,271-Speed 3003.89 samples/sec   Loss 5.3028   LearningRate 0.0128   Epoch: 12   Global Step: 159410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:48:41,690-Speed 2995.98 samples/sec   Loss 5.3000   LearningRate 0.0128   Epoch: 12   Global Step: 159420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:48:45,013-Speed 3082.31 samples/sec   Loss 5.2999   LearningRate 0.0128   Epoch: 12   Global Step: 159430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:48:48,341-Speed 3077.41 samples/sec   Loss 5.2947   LearningRate 0.0128   Epoch: 12   Global Step: 159440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:48:51,757-Speed 2998.54 samples/sec   Loss 5.2686   LearningRate 0.0128   Epoch: 12   Global Step: 159450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:48:55,134-Speed 3032.98 samples/sec   Loss 5.2341   LearningRate 0.0128   Epoch: 12   Global Step: 159460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:48:58,558-Speed 2992.10 samples/sec   Loss 5.3058   LearningRate 0.0128   Epoch: 12   Global Step: 159470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:49:01,987-Speed 2987.59 samples/sec   Loss 5.2178   LearningRate 0.0128   Epoch: 12   Global Step: 159480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:49:05,372-Speed 3026.15 samples/sec   Loss 5.3643   LearningRate 0.0128   Epoch: 12   Global Step: 159490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:49:08,792-Speed 2994.71 samples/sec   Loss 5.2300   LearningRate 0.0128   Epoch: 12   Global Step: 159500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:49:12,189-Speed 3015.22 samples/sec   Loss 5.2969   LearningRate 0.0128   Epoch: 12   Global Step: 159510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:49:15,564-Speed 3034.58 samples/sec   Loss 5.2979   LearningRate 0.0128   Epoch: 12   Global Step: 159520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:49:18,914-Speed 3058.11 samples/sec   Loss 5.2574   LearningRate 0.0128   Epoch: 12   Global Step: 159530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:49:22,281-Speed 3041.90 samples/sec   Loss 5.3201   LearningRate 0.0128   Epoch: 12   Global Step: 159540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:49:25,619-Speed 3069.37 samples/sec   Loss 5.2040   LearningRate 0.0128   Epoch: 12   Global Step: 159550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:49:29,015-Speed 3016.49 samples/sec   Loss 5.3873   LearningRate 0.0128   Epoch: 12   Global Step: 159560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:49:32,478-Speed 2957.71 samples/sec   Loss 5.3009   LearningRate 0.0128   Epoch: 12   Global Step: 159570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:49:35,837-Speed 3049.38 samples/sec   Loss 5.2961   LearningRate 0.0128   Epoch: 12   Global Step: 159580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:49:39,242-Speed 3008.30 samples/sec   Loss 5.3132   LearningRate 0.0128   Epoch: 12   Global Step: 159590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:49:42,638-Speed 3016.24 samples/sec   Loss 5.4018   LearningRate 0.0128   Epoch: 12   Global Step: 159600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:49:45,983-Speed 3062.13 samples/sec   Loss 5.2505   LearningRate 0.0128   Epoch: 12   Global Step: 159610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:49:49,406-Speed 2992.08 samples/sec   Loss 5.3003   LearningRate 0.0128   Epoch: 12   Global Step: 159620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:49:52,742-Speed 3071.07 samples/sec   Loss 5.3375   LearningRate 0.0128   Epoch: 12   Global Step: 159630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:49:56,146-Speed 3008.49 samples/sec   Loss 5.2946   LearningRate 0.0128   Epoch: 12   Global Step: 159640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:49:59,524-Speed 3032.74 samples/sec   Loss 5.1931   LearningRate 0.0128   Epoch: 12   Global Step: 159650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:50:02,846-Speed 3083.19 samples/sec   Loss 5.3752   LearningRate 0.0128   Epoch: 12   Global Step: 159660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:50:06,253-Speed 3005.98 samples/sec   Loss 5.3216   LearningRate 0.0128   Epoch: 12   Global Step: 159670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:50:09,651-Speed 3014.50 samples/sec   Loss 5.3708   LearningRate 0.0128   Epoch: 12   Global Step: 159680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:50:13,006-Speed 3052.73 samples/sec   Loss 5.2627   LearningRate 0.0128   Epoch: 12   Global Step: 159690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:50:16,405-Speed 3014.44 samples/sec   Loss 5.3166   LearningRate 0.0128   Epoch: 12   Global Step: 159700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:50:19,880-Speed 2947.45 samples/sec   Loss 5.4503   LearningRate 0.0128   Epoch: 12   Global Step: 159710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:50:23,337-Speed 2962.91 samples/sec   Loss 5.3062   LearningRate 0.0127   Epoch: 12   Global Step: 159720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:50:26,713-Speed 3034.56 samples/sec   Loss 5.3534   LearningRate 0.0127   Epoch: 12   Global Step: 159730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:50:30,127-Speed 2999.84 samples/sec   Loss 5.2911   LearningRate 0.0127   Epoch: 12   Global Step: 159740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:50:33,531-Speed 3009.48 samples/sec   Loss 5.3521   LearningRate 0.0127   Epoch: 12   Global Step: 159750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:50:36,982-Speed 2968.99 samples/sec   Loss 5.2322   LearningRate 0.0127   Epoch: 12   Global Step: 159760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:50:40,404-Speed 2992.77 samples/sec   Loss 5.3075   LearningRate 0.0127   Epoch: 12   Global Step: 159770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:50:43,801-Speed 3015.85 samples/sec   Loss 5.1583   LearningRate 0.0127   Epoch: 12   Global Step: 159780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:50:47,142-Speed 3066.61 samples/sec   Loss 5.4086   LearningRate 0.0127   Epoch: 12   Global Step: 159790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:50:50,510-Speed 3040.85 samples/sec   Loss 5.2788   LearningRate 0.0127   Epoch: 12   Global Step: 159800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:50:53,942-Speed 2984.74 samples/sec   Loss 5.2218   LearningRate 0.0127   Epoch: 12   Global Step: 159810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:50:57,305-Speed 3045.69 samples/sec   Loss 5.4152   LearningRate 0.0127   Epoch: 12   Global Step: 159820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:00,722-Speed 2997.42 samples/sec   Loss 5.1722   LearningRate 0.0127   Epoch: 12   Global Step: 159830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:04,177-Speed 2964.84 samples/sec   Loss 5.3646   LearningRate 0.0127   Epoch: 12   Global Step: 159840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:07,550-Speed 3037.00 samples/sec   Loss 5.1962   LearningRate 0.0127   Epoch: 12   Global Step: 159850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:10,922-Speed 3037.28 samples/sec   Loss 5.2182   LearningRate 0.0127   Epoch: 12   Global Step: 159860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:14,293-Speed 3038.03 samples/sec   Loss 5.2638   LearningRate 0.0127   Epoch: 12   Global Step: 159870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:17,639-Speed 3062.08 samples/sec   Loss 5.2784   LearningRate 0.0127   Epoch: 12   Global Step: 159880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:51:21,008-Speed 3039.63 samples/sec   Loss 5.1765   LearningRate 0.0127   Epoch: 12   Global Step: 159890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:24,473-Speed 2956.23 samples/sec   Loss 5.3096   LearningRate 0.0127   Epoch: 12   Global Step: 159900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:27,863-Speed 3021.80 samples/sec   Loss 5.1860   LearningRate 0.0127   Epoch: 12   Global Step: 159910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:31,390-Speed 2903.74 samples/sec   Loss 5.2645   LearningRate 0.0127   Epoch: 12   Global Step: 159920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:34,812-Speed 2993.67 samples/sec   Loss 5.2301   LearningRate 0.0127   Epoch: 12   Global Step: 159930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:38,175-Speed 3045.42 samples/sec   Loss 5.2935   LearningRate 0.0127   Epoch: 12   Global Step: 159940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:41,554-Speed 3031.53 samples/sec   Loss 5.2531   LearningRate 0.0127   Epoch: 12   Global Step: 159950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:44,983-Speed 2987.41 samples/sec   Loss 5.3530   LearningRate 0.0127   Epoch: 12   Global Step: 159960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:48,453-Speed 2951.31 samples/sec   Loss 5.4850   LearningRate 0.0127   Epoch: 12   Global Step: 159970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:51,894-Speed 2976.97 samples/sec   Loss 5.2629   LearningRate 0.0127   Epoch: 12   Global Step: 159980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:51:55,352-Speed 2962.39 samples/sec   Loss 5.3039   LearningRate 0.0127   Epoch: 12   Global Step: 159990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:51:58,826-Speed 2948.40 samples/sec   Loss 5.2154   LearningRate 0.0127   Epoch: 12   Global Step: 160000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:52:02,192-Speed 3043.57 samples/sec   Loss 5.2168   LearningRate 0.0127   Epoch: 12   Global Step: 160010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:52:05,671-Speed 2944.15 samples/sec   Loss 5.3403   LearningRate 0.0127   Epoch: 12   Global Step: 160020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:52:09,045-Speed 3035.87 samples/sec   Loss 5.2565   LearningRate 0.0127   Epoch: 12   Global Step: 160030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:52:12,498-Speed 2966.50 samples/sec   Loss 5.1999   LearningRate 0.0127   Epoch: 12   Global Step: 160040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:52:15,869-Speed 3038.07 samples/sec   Loss 5.2838   LearningRate 0.0127   Epoch: 12   Global Step: 160050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:52:19,269-Speed 3012.91 samples/sec   Loss 5.2244   LearningRate 0.0127   Epoch: 12   Global Step: 160060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:52:22,612-Speed 3063.92 samples/sec   Loss 5.2179   LearningRate 0.0126   Epoch: 12   Global Step: 160070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:52:26,003-Speed 3020.38 samples/sec   Loss 5.2597   LearningRate 0.0126   Epoch: 12   Global Step: 160080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:52:29,356-Speed 3055.23 samples/sec   Loss 5.3101   LearningRate 0.0126   Epoch: 12   Global Step: 160090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:52:32,728-Speed 3038.06 samples/sec   Loss 5.3306   LearningRate 0.0126   Epoch: 12   Global Step: 160100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:52:36,111-Speed 3026.91 samples/sec   Loss 5.3281   LearningRate 0.0126   Epoch: 12   Global Step: 160110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:52:39,467-Speed 3052.23 samples/sec   Loss 5.2827   LearningRate 0.0126   Epoch: 12   Global Step: 160120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:52:42,802-Speed 3072.01 samples/sec   Loss 5.3202   LearningRate 0.0126   Epoch: 12   Global Step: 160130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:52:46,243-Speed 2976.37 samples/sec   Loss 5.3372   LearningRate 0.0126   Epoch: 12   Global Step: 160140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:52:49,621-Speed 3031.81 samples/sec   Loss 5.3343   LearningRate 0.0126   Epoch: 12   Global Step: 160150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:52:53,065-Speed 2974.46 samples/sec   Loss 5.2906   LearningRate 0.0126   Epoch: 12   Global Step: 160160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:52:56,459-Speed 3017.65 samples/sec   Loss 5.3190   LearningRate 0.0126   Epoch: 12   Global Step: 160170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:52:59,829-Speed 3039.79 samples/sec   Loss 5.2541   LearningRate 0.0126   Epoch: 12   Global Step: 160180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:53:03,209-Speed 3029.77 samples/sec   Loss 5.2946   LearningRate 0.0126   Epoch: 12   Global Step: 160190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:53:06,602-Speed 3019.48 samples/sec   Loss 5.2344   LearningRate 0.0126   Epoch: 12   Global Step: 160200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:53:09,940-Speed 3068.91 samples/sec   Loss 5.3049   LearningRate 0.0126   Epoch: 12   Global Step: 160210   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 16:53:13,283-Speed 3063.83 samples/sec   Loss 5.2656   LearningRate 0.0126   Epoch: 12   Global Step: 160220   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 16:53:16,673-Speed 3020.99 samples/sec   Loss 5.2602   LearningRate 0.0126   Epoch: 12   Global Step: 160230   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 16:53:20,031-Speed 3050.64 samples/sec   Loss 5.2811   LearningRate 0.0126   Epoch: 12   Global Step: 160240   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 16:53:23,396-Speed 3043.98 samples/sec   Loss 5.3675   LearningRate 0.0126   Epoch: 12   Global Step: 160250   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 16:53:26,747-Speed 3056.93 samples/sec   Loss 5.2590   LearningRate 0.0126   Epoch: 12   Global Step: 160260   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 16:53:30,160-Speed 3001.47 samples/sec   Loss 5.2346   LearningRate 0.0126   Epoch: 12   Global Step: 160270   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 16:53:33,539-Speed 3030.57 samples/sec   Loss 5.3146   LearningRate 0.0126   Epoch: 12   Global Step: 160280   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 16:53:36,866-Speed 3079.01 samples/sec   Loss 5.2681   LearningRate 0.0126   Epoch: 12   Global Step: 160290   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 16:53:40,233-Speed 3042.51 samples/sec   Loss 5.2127   LearningRate 0.0126   Epoch: 12   Global Step: 160300   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 16:53:43,597-Speed 3044.41 samples/sec   Loss 5.2532   LearningRate 0.0126   Epoch: 12   Global Step: 160310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:53:46,957-Speed 3048.42 samples/sec   Loss 5.1864   LearningRate 0.0126   Epoch: 12   Global Step: 160320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:53:50,333-Speed 3034.51 samples/sec   Loss 5.2205   LearningRate 0.0126   Epoch: 12   Global Step: 160330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:53:53,700-Speed 3042.11 samples/sec   Loss 5.2789   LearningRate 0.0126   Epoch: 12   Global Step: 160340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:53:57,017-Speed 3087.94 samples/sec   Loss 5.3729   LearningRate 0.0126   Epoch: 12   Global Step: 160350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:54:00,364-Speed 3060.82 samples/sec   Loss 5.3431   LearningRate 0.0126   Epoch: 12   Global Step: 160360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:54:03,810-Speed 2972.73 samples/sec   Loss 5.2134   LearningRate 0.0126   Epoch: 12   Global Step: 160370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:54:07,125-Speed 3089.74 samples/sec   Loss 5.1951   LearningRate 0.0126   Epoch: 12   Global Step: 160380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:54:10,527-Speed 3010.73 samples/sec   Loss 5.3174   LearningRate 0.0126   Epoch: 12   Global Step: 160390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:54:14,007-Speed 2943.65 samples/sec   Loss 5.1813   LearningRate 0.0126   Epoch: 12   Global Step: 160400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:54:17,420-Speed 3001.04 samples/sec   Loss 5.2216   LearningRate 0.0126   Epoch: 12   Global Step: 160410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:54:20,803-Speed 3027.95 samples/sec   Loss 5.2362   LearningRate 0.0125   Epoch: 12   Global Step: 160420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:54:24,223-Speed 2994.25 samples/sec   Loss 5.3243   LearningRate 0.0125   Epoch: 12   Global Step: 160430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:54:27,634-Speed 3003.44 samples/sec   Loss 5.1550   LearningRate 0.0125   Epoch: 12   Global Step: 160440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:54:31,117-Speed 2940.21 samples/sec   Loss 5.2981   LearningRate 0.0125   Epoch: 12   Global Step: 160450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:54:34,501-Speed 3027.65 samples/sec   Loss 5.2316   LearningRate 0.0125   Epoch: 12   Global Step: 160460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:54:37,886-Speed 3025.27 samples/sec   Loss 5.1607   LearningRate 0.0125   Epoch: 12   Global Step: 160470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:54:41,301-Speed 3000.01 samples/sec   Loss 5.2557   LearningRate 0.0125   Epoch: 12   Global Step: 160480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:54:44,716-Speed 2999.52 samples/sec   Loss 5.3073   LearningRate 0.0125   Epoch: 12   Global Step: 160490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:54:48,115-Speed 3013.16 samples/sec   Loss 5.2454   LearningRate 0.0125   Epoch: 12   Global Step: 160500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:54:51,471-Speed 3052.19 samples/sec   Loss 5.2458   LearningRate 0.0125   Epoch: 12   Global Step: 160510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:54:54,906-Speed 2981.84 samples/sec   Loss 5.2243   LearningRate 0.0125   Epoch: 12   Global Step: 160520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:54:58,347-Speed 2976.58 samples/sec   Loss 5.1702   LearningRate 0.0125   Epoch: 12   Global Step: 160530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:55:01,663-Speed 3089.07 samples/sec   Loss 5.2101   LearningRate 0.0125   Epoch: 12   Global Step: 160540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:55:05,155-Speed 2933.62 samples/sec   Loss 5.2153   LearningRate 0.0125   Epoch: 12   Global Step: 160550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:55:08,514-Speed 3050.40 samples/sec   Loss 5.2798   LearningRate 0.0125   Epoch: 12   Global Step: 160560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:55:11,983-Speed 2952.74 samples/sec   Loss 5.1585   LearningRate 0.0125   Epoch: 12   Global Step: 160570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:55:15,387-Speed 3008.83 samples/sec   Loss 5.3420   LearningRate 0.0125   Epoch: 12   Global Step: 160580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:55:18,805-Speed 2996.79 samples/sec   Loss 5.2301   LearningRate 0.0125   Epoch: 12   Global Step: 160590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:55:22,228-Speed 2992.51 samples/sec   Loss 5.2574   LearningRate 0.0125   Epoch: 12   Global Step: 160600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:55:25,564-Speed 3069.94 samples/sec   Loss 5.3146   LearningRate 0.0125   Epoch: 12   Global Step: 160610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:55:28,969-Speed 3008.77 samples/sec   Loss 5.2891   LearningRate 0.0125   Epoch: 12   Global Step: 160620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:55:32,369-Speed 3017.80 samples/sec   Loss 5.3106   LearningRate 0.0125   Epoch: 12   Global Step: 160630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:55:35,726-Speed 3051.63 samples/sec   Loss 5.1655   LearningRate 0.0125   Epoch: 12   Global Step: 160640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:55:39,104-Speed 3031.70 samples/sec   Loss 5.2072   LearningRate 0.0125   Epoch: 12   Global Step: 160650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:55:42,527-Speed 2993.03 samples/sec   Loss 5.1956   LearningRate 0.0125   Epoch: 12   Global Step: 160660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:55:45,919-Speed 3019.53 samples/sec   Loss 5.3228   LearningRate 0.0125   Epoch: 12   Global Step: 160670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:55:49,281-Speed 3047.60 samples/sec   Loss 5.1661   LearningRate 0.0125   Epoch: 12   Global Step: 160680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:55:52,698-Speed 2997.27 samples/sec   Loss 5.2380   LearningRate 0.0125   Epoch: 12   Global Step: 160690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:55:56,204-Speed 2921.08 samples/sec   Loss 5.2116   LearningRate 0.0125   Epoch: 12   Global Step: 160700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:55:59,625-Speed 2993.90 samples/sec   Loss 5.2580   LearningRate 0.0125   Epoch: 12   Global Step: 160710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:56:03,014-Speed 3023.34 samples/sec   Loss 5.1397   LearningRate 0.0125   Epoch: 12   Global Step: 160720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:56:06,470-Speed 2964.13 samples/sec   Loss 5.2316   LearningRate 0.0125   Epoch: 12   Global Step: 160730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:56:09,934-Speed 2956.77 samples/sec   Loss 5.2204   LearningRate 0.0125   Epoch: 12   Global Step: 160740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:56:13,290-Speed 3052.22 samples/sec   Loss 5.3204   LearningRate 0.0125   Epoch: 12   Global Step: 160750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:56:16,734-Speed 2973.59 samples/sec   Loss 5.2624   LearningRate 0.0125   Epoch: 12   Global Step: 160760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:56:20,146-Speed 3003.09 samples/sec   Loss 5.2809   LearningRate 0.0124   Epoch: 12   Global Step: 160770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:56:23,465-Speed 3086.04 samples/sec   Loss 5.2508   LearningRate 0.0124   Epoch: 12   Global Step: 160780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:56:26,812-Speed 3060.21 samples/sec   Loss 5.2129   LearningRate 0.0124   Epoch: 12   Global Step: 160790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:56:30,212-Speed 3012.37 samples/sec   Loss 5.1940   LearningRate 0.0124   Epoch: 12   Global Step: 160800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:56:33,632-Speed 2995.10 samples/sec   Loss 5.3156   LearningRate 0.0124   Epoch: 12   Global Step: 160810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:56:37,022-Speed 3020.97 samples/sec   Loss 5.2160   LearningRate 0.0124   Epoch: 12   Global Step: 160820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 16:56:40,426-Speed 3009.27 samples/sec   Loss 5.3065   LearningRate 0.0124   Epoch: 12   Global Step: 160830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 16:56:43,841-Speed 2999.96 samples/sec   Loss 5.2996   LearningRate 0.0124   Epoch: 12   Global Step: 160840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:56:47,222-Speed 3028.91 samples/sec   Loss 5.3638   LearningRate 0.0124   Epoch: 12   Global Step: 160850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:56:50,596-Speed 3035.66 samples/sec   Loss 5.3055   LearningRate 0.0124   Epoch: 12   Global Step: 160860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:56:54,008-Speed 3001.72 samples/sec   Loss 5.1424   LearningRate 0.0124   Epoch: 12   Global Step: 160870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:56:57,449-Speed 2976.96 samples/sec   Loss 5.1832   LearningRate 0.0124   Epoch: 12   Global Step: 160880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:57:00,766-Speed 3088.43 samples/sec   Loss 5.2016   LearningRate 0.0124   Epoch: 12   Global Step: 160890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:57:04,126-Speed 3048.57 samples/sec   Loss 5.3150   LearningRate 0.0124   Epoch: 12   Global Step: 160900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:57:07,597-Speed 2951.26 samples/sec   Loss 5.2482   LearningRate 0.0124   Epoch: 12   Global Step: 160910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:57:11,018-Speed 2993.67 samples/sec   Loss 5.2566   LearningRate 0.0124   Epoch: 12   Global Step: 160920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:57:14,393-Speed 3035.39 samples/sec   Loss 5.2860   LearningRate 0.0124   Epoch: 12   Global Step: 160930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:57:17,757-Speed 3044.38 samples/sec   Loss 5.2408   LearningRate 0.0124   Epoch: 12   Global Step: 160940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:57:21,200-Speed 2975.36 samples/sec   Loss 5.1642   LearningRate 0.0124   Epoch: 12   Global Step: 160950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:57:24,653-Speed 2966.41 samples/sec   Loss 5.3286   LearningRate 0.0124   Epoch: 12   Global Step: 160960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:57:28,087-Speed 2982.40 samples/sec   Loss 5.2099   LearningRate 0.0124   Epoch: 12   Global Step: 160970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:57:31,572-Speed 2939.75 samples/sec   Loss 5.2847   LearningRate 0.0124   Epoch: 12   Global Step: 160980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:57:34,987-Speed 2998.81 samples/sec   Loss 5.2896   LearningRate 0.0124   Epoch: 12   Global Step: 160990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:57:38,322-Speed 3071.80 samples/sec   Loss 5.1476   LearningRate 0.0124   Epoch: 12   Global Step: 161000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:57:41,753-Speed 2985.36 samples/sec   Loss 5.1827   LearningRate 0.0124   Epoch: 12   Global Step: 161010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:57:45,128-Speed 3034.41 samples/sec   Loss 5.2370   LearningRate 0.0124   Epoch: 12   Global Step: 161020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:57:48,553-Speed 2990.74 samples/sec   Loss 5.2035   LearningRate 0.0124   Epoch: 12   Global Step: 161030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:57:51,940-Speed 3024.60 samples/sec   Loss 5.1686   LearningRate 0.0124   Epoch: 12   Global Step: 161040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:57:55,303-Speed 3045.56 samples/sec   Loss 5.2215   LearningRate 0.0124   Epoch: 12   Global Step: 161050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:57:58,688-Speed 3025.49 samples/sec   Loss 5.3129   LearningRate 0.0124   Epoch: 12   Global Step: 161060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:58:02,065-Speed 3033.71 samples/sec   Loss 5.3241   LearningRate 0.0124   Epoch: 12   Global Step: 161070   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:58:05,506-Speed 2976.52 samples/sec   Loss 5.1590   LearningRate 0.0124   Epoch: 12   Global Step: 161080   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:58:08,876-Speed 3039.27 samples/sec   Loss 5.3016   LearningRate 0.0124   Epoch: 12   Global Step: 161090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:58:12,312-Speed 2982.64 samples/sec   Loss 5.2199   LearningRate 0.0124   Epoch: 12   Global Step: 161100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:58:15,685-Speed 3036.90 samples/sec   Loss 5.1933   LearningRate 0.0124   Epoch: 12   Global Step: 161110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:58:19,081-Speed 3016.33 samples/sec   Loss 5.1900   LearningRate 0.0123   Epoch: 12   Global Step: 161120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:58:22,470-Speed 3022.63 samples/sec   Loss 5.1711   LearningRate 0.0123   Epoch: 12   Global Step: 161130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:58:25,788-Speed 3086.54 samples/sec   Loss 5.1931   LearningRate 0.0123   Epoch: 12   Global Step: 161140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:58:29,154-Speed 3043.09 samples/sec   Loss 5.1304   LearningRate 0.0123   Epoch: 12   Global Step: 161150   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:58:32,516-Speed 3046.93 samples/sec   Loss 5.1112   LearningRate 0.0123   Epoch: 12   Global Step: 161160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:58:35,852-Speed 3070.30 samples/sec   Loss 5.2386   LearningRate 0.0123   Epoch: 12   Global Step: 161170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:58:39,247-Speed 3017.43 samples/sec   Loss 5.2389   LearningRate 0.0123   Epoch: 12   Global Step: 161180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:58:42,629-Speed 3028.28 samples/sec   Loss 5.1987   LearningRate 0.0123   Epoch: 12   Global Step: 161190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:58:46,004-Speed 3035.47 samples/sec   Loss 5.2687   LearningRate 0.0123   Epoch: 12   Global Step: 161200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:58:49,326-Speed 3084.21 samples/sec   Loss 5.2344   LearningRate 0.0123   Epoch: 12   Global Step: 161210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:58:53,345-Speed 2548.20 samples/sec   Loss 5.2985   LearningRate 0.0123   Epoch: 12   Global Step: 161220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:58:56,785-Speed 2976.94 samples/sec   Loss 5.1946   LearningRate 0.0123   Epoch: 12   Global Step: 161230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:59:00,231-Speed 2972.95 samples/sec   Loss 5.2191   LearningRate 0.0123   Epoch: 12   Global Step: 161240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:59:03,680-Speed 2969.47 samples/sec   Loss 5.2360   LearningRate 0.0123   Epoch: 12   Global Step: 161250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:59:07,103-Speed 2992.55 samples/sec   Loss 5.2188   LearningRate 0.0123   Epoch: 12   Global Step: 161260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:59:10,500-Speed 3015.67 samples/sec   Loss 5.1800   LearningRate 0.0123   Epoch: 12   Global Step: 161270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:59:13,927-Speed 2988.85 samples/sec   Loss 5.1755   LearningRate 0.0123   Epoch: 12   Global Step: 161280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:59:17,330-Speed 3009.85 samples/sec   Loss 5.1072   LearningRate 0.0123   Epoch: 12   Global Step: 161290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:59:20,781-Speed 2968.23 samples/sec   Loss 5.2519   LearningRate 0.0123   Epoch: 12   Global Step: 161300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:59:24,163-Speed 3028.35 samples/sec   Loss 5.1480   LearningRate 0.0123   Epoch: 12   Global Step: 161310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:59:27,635-Speed 2950.52 samples/sec   Loss 5.2030   LearningRate 0.0123   Epoch: 12   Global Step: 161320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:59:31,034-Speed 3014.21 samples/sec   Loss 5.2528   LearningRate 0.0123   Epoch: 12   Global Step: 161330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:59:34,470-Speed 2981.13 samples/sec   Loss 5.2390   LearningRate 0.0123   Epoch: 12   Global Step: 161340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:59:37,876-Speed 3007.27 samples/sec   Loss 5.2501   LearningRate 0.0123   Epoch: 12   Global Step: 161350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:59:41,248-Speed 3037.86 samples/sec   Loss 5.2071   LearningRate 0.0123   Epoch: 12   Global Step: 161360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:59:44,624-Speed 3034.10 samples/sec   Loss 5.3184   LearningRate 0.0123   Epoch: 12   Global Step: 161370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:59:48,042-Speed 2996.81 samples/sec   Loss 5.2652   LearningRate 0.0123   Epoch: 12   Global Step: 161380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 16:59:51,377-Speed 3071.61 samples/sec   Loss 5.1520   LearningRate 0.0123   Epoch: 12   Global Step: 161390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:59:54,797-Speed 2994.74 samples/sec   Loss 5.1613   LearningRate 0.0123   Epoch: 12   Global Step: 161400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 16:59:58,134-Speed 3071.02 samples/sec   Loss 5.2196   LearningRate 0.0123   Epoch: 12   Global Step: 161410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:00:01,488-Speed 3054.14 samples/sec   Loss 5.2892   LearningRate 0.0123   Epoch: 12   Global Step: 161420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:00:04,900-Speed 3001.96 samples/sec   Loss 5.3001   LearningRate 0.0123   Epoch: 12   Global Step: 161430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:00:08,239-Speed 3067.91 samples/sec   Loss 5.2496   LearningRate 0.0123   Epoch: 12   Global Step: 161440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:00:11,605-Speed 3042.37 samples/sec   Loss 5.1671   LearningRate 0.0123   Epoch: 12   Global Step: 161450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:00:15,031-Speed 2990.27 samples/sec   Loss 5.1857   LearningRate 0.0123   Epoch: 12   Global Step: 161460   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:00:18,577-Speed 2888.45 samples/sec   Loss 5.1882   LearningRate 0.0123   Epoch: 12   Global Step: 161470   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:00:49,768-Speed 328.32 samples/sec   Loss 4.1629   LearningRate 0.0122   Epoch: 13   Global Step: 161480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:00:53,528-Speed 2724.18 samples/sec   Loss 3.8057   LearningRate 0.0122   Epoch: 13   Global Step: 161490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:00:56,839-Speed 3093.72 samples/sec   Loss 3.7816   LearningRate 0.0122   Epoch: 13   Global Step: 161500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:01:00,196-Speed 3051.36 samples/sec   Loss 3.7689   LearningRate 0.0122   Epoch: 13   Global Step: 161510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:01:03,636-Speed 2977.67 samples/sec   Loss 3.8991   LearningRate 0.0122   Epoch: 13   Global Step: 161520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:01:06,997-Speed 3047.70 samples/sec   Loss 3.8470   LearningRate 0.0122   Epoch: 13   Global Step: 161530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:01:10,366-Speed 3040.37 samples/sec   Loss 3.8059   LearningRate 0.0122   Epoch: 13   Global Step: 161540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:01:13,759-Speed 3018.66 samples/sec   Loss 3.9172   LearningRate 0.0122   Epoch: 13   Global Step: 161550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:01:17,199-Speed 2978.18 samples/sec   Loss 3.8364   LearningRate 0.0122   Epoch: 13   Global Step: 161560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:01:20,644-Speed 2973.46 samples/sec   Loss 3.8220   LearningRate 0.0122   Epoch: 13   Global Step: 161570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:01:24,017-Speed 3037.08 samples/sec   Loss 3.7172   LearningRate 0.0122   Epoch: 13   Global Step: 161580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:01:27,488-Speed 2950.75 samples/sec   Loss 3.8377   LearningRate 0.0122   Epoch: 13   Global Step: 161590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:01:30,991-Speed 2924.03 samples/sec   Loss 3.8595   LearningRate 0.0122   Epoch: 13   Global Step: 161600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:01:34,596-Speed 2841.30 samples/sec   Loss 3.8116   LearningRate 0.0122   Epoch: 13   Global Step: 161610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:01:38,301-Speed 2764.25 samples/sec   Loss 3.8510   LearningRate 0.0122   Epoch: 13   Global Step: 161620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:01:41,741-Speed 2978.33 samples/sec   Loss 3.7937   LearningRate 0.0122   Epoch: 13   Global Step: 161630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:01:45,092-Speed 3056.62 samples/sec   Loss 3.8784   LearningRate 0.0122   Epoch: 13   Global Step: 161640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:01:48,479-Speed 3023.95 samples/sec   Loss 3.8700   LearningRate 0.0122   Epoch: 13   Global Step: 161650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:01:51,842-Speed 3046.50 samples/sec   Loss 3.9399   LearningRate 0.0122   Epoch: 13   Global Step: 161660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:01:55,208-Speed 3042.30 samples/sec   Loss 3.7579   LearningRate 0.0122   Epoch: 13   Global Step: 161670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:01:58,615-Speed 3006.70 samples/sec   Loss 3.8176   LearningRate 0.0122   Epoch: 13   Global Step: 161680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:02:02,021-Speed 3007.26 samples/sec   Loss 3.9546   LearningRate 0.0122   Epoch: 13   Global Step: 161690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:02:05,381-Speed 3048.78 samples/sec   Loss 3.7863   LearningRate 0.0122   Epoch: 13   Global Step: 161700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:02:08,745-Speed 3044.83 samples/sec   Loss 3.8083   LearningRate 0.0122   Epoch: 13   Global Step: 161710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:02:12,129-Speed 3026.27 samples/sec   Loss 3.8229   LearningRate 0.0122   Epoch: 13   Global Step: 161720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:02:15,516-Speed 3024.56 samples/sec   Loss 3.8835   LearningRate 0.0122   Epoch: 13   Global Step: 161730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:02:18,996-Speed 2943.41 samples/sec   Loss 3.8304   LearningRate 0.0122   Epoch: 13   Global Step: 161740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:02:22,416-Speed 2995.37 samples/sec   Loss 3.9259   LearningRate 0.0122   Epoch: 13   Global Step: 161750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:02:25,755-Speed 3068.01 samples/sec   Loss 3.8309   LearningRate 0.0122   Epoch: 13   Global Step: 161760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:02:29,179-Speed 2991.40 samples/sec   Loss 3.7665   LearningRate 0.0122   Epoch: 13   Global Step: 161770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:02:32,605-Speed 2989.93 samples/sec   Loss 3.7946   LearningRate 0.0122   Epoch: 13   Global Step: 161780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:02:35,972-Speed 3041.94 samples/sec   Loss 3.8410   LearningRate 0.0122   Epoch: 13   Global Step: 161790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:02:39,350-Speed 3032.25 samples/sec   Loss 3.7649   LearningRate 0.0122   Epoch: 13   Global Step: 161800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:02:42,761-Speed 3003.06 samples/sec   Loss 3.8393   LearningRate 0.0122   Epoch: 13   Global Step: 161810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:02:46,100-Speed 3067.92 samples/sec   Loss 3.8518   LearningRate 0.0122   Epoch: 13   Global Step: 161820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:02:49,573-Speed 2949.67 samples/sec   Loss 3.9590   LearningRate 0.0121   Epoch: 13   Global Step: 161830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:02:53,020-Speed 2971.42 samples/sec   Loss 3.9605   LearningRate 0.0121   Epoch: 13   Global Step: 161840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:02:56,406-Speed 3025.01 samples/sec   Loss 3.9579   LearningRate 0.0121   Epoch: 13   Global Step: 161850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:02:59,841-Speed 2982.33 samples/sec   Loss 3.8656   LearningRate 0.0121   Epoch: 13   Global Step: 161860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:03:03,250-Speed 3004.63 samples/sec   Loss 3.8373   LearningRate 0.0121   Epoch: 13   Global Step: 161870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:03:06,635-Speed 3025.54 samples/sec   Loss 3.8427   LearningRate 0.0121   Epoch: 13   Global Step: 161880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:03:09,995-Speed 3047.96 samples/sec   Loss 3.8777   LearningRate 0.0121   Epoch: 13   Global Step: 161890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:03:13,379-Speed 3026.98 samples/sec   Loss 3.8506   LearningRate 0.0121   Epoch: 13   Global Step: 161900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:03:16,720-Speed 3065.96 samples/sec   Loss 3.9199   LearningRate 0.0121   Epoch: 13   Global Step: 161910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:03:20,039-Speed 3086.70 samples/sec   Loss 3.8829   LearningRate 0.0121   Epoch: 13   Global Step: 161920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:03:23,408-Speed 3040.38 samples/sec   Loss 3.8637   LearningRate 0.0121   Epoch: 13   Global Step: 161930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:03:26,760-Speed 3055.63 samples/sec   Loss 3.9317   LearningRate 0.0121   Epoch: 13   Global Step: 161940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:03:30,123-Speed 3045.38 samples/sec   Loss 3.9290   LearningRate 0.0121   Epoch: 13   Global Step: 161950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:03:33,524-Speed 3012.06 samples/sec   Loss 3.8490   LearningRate 0.0121   Epoch: 13   Global Step: 161960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:03:36,934-Speed 3003.22 samples/sec   Loss 3.9766   LearningRate 0.0121   Epoch: 13   Global Step: 161970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:03:40,340-Speed 3007.74 samples/sec   Loss 3.9480   LearningRate 0.0121   Epoch: 13   Global Step: 161980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:03:43,686-Speed 3060.87 samples/sec   Loss 3.9586   LearningRate 0.0121   Epoch: 13   Global Step: 161990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:03:47,108-Speed 2993.77 samples/sec   Loss 3.9353   LearningRate 0.0121   Epoch: 13   Global Step: 162000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:03:50,452-Speed 3063.33 samples/sec   Loss 4.0452   LearningRate 0.0121   Epoch: 13   Global Step: 162010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:03:53,800-Speed 3060.04 samples/sec   Loss 3.8920   LearningRate 0.0121   Epoch: 13   Global Step: 162020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:03:57,130-Speed 3076.22 samples/sec   Loss 3.9300   LearningRate 0.0121   Epoch: 13   Global Step: 162030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:00,566-Speed 2981.38 samples/sec   Loss 3.9901   LearningRate 0.0121   Epoch: 13   Global Step: 162040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:03,922-Speed 3051.83 samples/sec   Loss 3.9139   LearningRate 0.0121   Epoch: 13   Global Step: 162050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:07,245-Speed 3082.52 samples/sec   Loss 3.9395   LearningRate 0.0121   Epoch: 13   Global Step: 162060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:10,596-Speed 3056.77 samples/sec   Loss 3.8120   LearningRate 0.0121   Epoch: 13   Global Step: 162070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:13,923-Speed 3078.74 samples/sec   Loss 3.9278   LearningRate 0.0121   Epoch: 13   Global Step: 162080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:17,305-Speed 3030.06 samples/sec   Loss 3.9418   LearningRate 0.0121   Epoch: 13   Global Step: 162090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:20,737-Speed 2984.16 samples/sec   Loss 3.8143   LearningRate 0.0121   Epoch: 13   Global Step: 162100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:24,157-Speed 2994.99 samples/sec   Loss 3.9509   LearningRate 0.0121   Epoch: 13   Global Step: 162110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:27,618-Speed 2959.89 samples/sec   Loss 3.9871   LearningRate 0.0121   Epoch: 13   Global Step: 162120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:30,937-Speed 3086.21 samples/sec   Loss 3.9985   LearningRate 0.0121   Epoch: 13   Global Step: 162130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:34,276-Speed 3067.30 samples/sec   Loss 3.8762   LearningRate 0.0121   Epoch: 13   Global Step: 162140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:37,591-Speed 3090.31 samples/sec   Loss 3.8476   LearningRate 0.0121   Epoch: 13   Global Step: 162150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:40,946-Speed 3052.69 samples/sec   Loss 4.0608   LearningRate 0.0121   Epoch: 13   Global Step: 162160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:44,317-Speed 3038.97 samples/sec   Loss 3.9650   LearningRate 0.0121   Epoch: 13   Global Step: 162170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:47,677-Speed 3048.78 samples/sec   Loss 4.0906   LearningRate 0.0121   Epoch: 13   Global Step: 162180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:51,050-Speed 3036.93 samples/sec   Loss 3.9613   LearningRate 0.0120   Epoch: 13   Global Step: 162190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:04:54,433-Speed 3027.36 samples/sec   Loss 3.8214   LearningRate 0.0120   Epoch: 13   Global Step: 162200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:04:57,798-Speed 3044.49 samples/sec   Loss 4.0196   LearningRate 0.0120   Epoch: 13   Global Step: 162210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:01,120-Speed 3082.79 samples/sec   Loss 3.9247   LearningRate 0.0120   Epoch: 13   Global Step: 162220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:04,453-Speed 3072.93 samples/sec   Loss 3.9431   LearningRate 0.0120   Epoch: 13   Global Step: 162230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:07,839-Speed 3025.47 samples/sec   Loss 3.9733   LearningRate 0.0120   Epoch: 13   Global Step: 162240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:11,168-Speed 3076.74 samples/sec   Loss 4.0665   LearningRate 0.0120   Epoch: 13   Global Step: 162250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:14,496-Speed 3077.19 samples/sec   Loss 3.9859   LearningRate 0.0120   Epoch: 13   Global Step: 162260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:17,875-Speed 3032.20 samples/sec   Loss 4.0102   LearningRate 0.0120   Epoch: 13   Global Step: 162270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:21,210-Speed 3070.78 samples/sec   Loss 3.9258   LearningRate 0.0120   Epoch: 13   Global Step: 162280   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:24,574-Speed 3044.85 samples/sec   Loss 3.9245   LearningRate 0.0120   Epoch: 13   Global Step: 162290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:27,892-Speed 3087.59 samples/sec   Loss 3.9922   LearningRate 0.0120   Epoch: 13   Global Step: 162300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:05:31,262-Speed 3039.34 samples/sec   Loss 3.9656   LearningRate 0.0120   Epoch: 13   Global Step: 162310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:05:34,670-Speed 3005.46 samples/sec   Loss 4.0008   LearningRate 0.0120   Epoch: 13   Global Step: 162320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:38,172-Speed 2925.33 samples/sec   Loss 3.9164   LearningRate 0.0120   Epoch: 13   Global Step: 162330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:41,607-Speed 2981.91 samples/sec   Loss 4.0772   LearningRate 0.0120   Epoch: 13   Global Step: 162340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:44,940-Speed 3073.26 samples/sec   Loss 3.9411   LearningRate 0.0120   Epoch: 13   Global Step: 162350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:48,295-Speed 3052.78 samples/sec   Loss 4.1733   LearningRate 0.0120   Epoch: 13   Global Step: 162360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:51,699-Speed 3008.60 samples/sec   Loss 4.0131   LearningRate 0.0120   Epoch: 13   Global Step: 162370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:55,155-Speed 2964.74 samples/sec   Loss 4.0899   LearningRate 0.0120   Epoch: 13   Global Step: 162380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:05:58,552-Speed 3015.10 samples/sec   Loss 3.9861   LearningRate 0.0120   Epoch: 13   Global Step: 162390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:06:01,961-Speed 3004.08 samples/sec   Loss 4.0054   LearningRate 0.0120   Epoch: 13   Global Step: 162400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:06:05,307-Speed 3061.25 samples/sec   Loss 4.0759   LearningRate 0.0120   Epoch: 13   Global Step: 162410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:06:08,750-Speed 2975.27 samples/sec   Loss 3.9550   LearningRate 0.0120   Epoch: 13   Global Step: 162420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:06:12,092-Speed 3065.73 samples/sec   Loss 4.0321   LearningRate 0.0120   Epoch: 13   Global Step: 162430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:06:15,472-Speed 3029.66 samples/sec   Loss 4.0037   LearningRate 0.0120   Epoch: 13   Global Step: 162440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:06:18,796-Speed 3081.96 samples/sec   Loss 4.0727   LearningRate 0.0120   Epoch: 13   Global Step: 162450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:06:22,158-Speed 3046.48 samples/sec   Loss 4.0263   LearningRate 0.0120   Epoch: 13   Global Step: 162460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:06:25,633-Speed 2947.66 samples/sec   Loss 4.0023   LearningRate 0.0120   Epoch: 13   Global Step: 162470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:06:29,029-Speed 3016.12 samples/sec   Loss 4.0500   LearningRate 0.0120   Epoch: 13   Global Step: 162480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:06:32,499-Speed 2952.62 samples/sec   Loss 4.0161   LearningRate 0.0120   Epoch: 13   Global Step: 162490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:06:35,966-Speed 2954.60 samples/sec   Loss 4.0294   LearningRate 0.0120   Epoch: 13   Global Step: 162500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:06:39,339-Speed 3036.97 samples/sec   Loss 4.0341   LearningRate 0.0120   Epoch: 13   Global Step: 162510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:06:42,667-Speed 3077.66 samples/sec   Loss 4.0919   LearningRate 0.0120   Epoch: 13   Global Step: 162520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:06:46,038-Speed 3038.42 samples/sec   Loss 4.0806   LearningRate 0.0120   Epoch: 13   Global Step: 162530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:06:49,419-Speed 3030.05 samples/sec   Loss 4.0191   LearningRate 0.0120   Epoch: 13   Global Step: 162540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:06:52,839-Speed 2994.16 samples/sec   Loss 4.0572   LearningRate 0.0119   Epoch: 13   Global Step: 162550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:06:56,279-Speed 2978.07 samples/sec   Loss 4.0834   LearningRate 0.0119   Epoch: 13   Global Step: 162560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:06:59,699-Speed 2994.62 samples/sec   Loss 4.0891   LearningRate 0.0119   Epoch: 13   Global Step: 162570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:07:03,088-Speed 3022.21 samples/sec   Loss 4.1022   LearningRate 0.0119   Epoch: 13   Global Step: 162580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:07:06,418-Speed 3076.45 samples/sec   Loss 4.0666   LearningRate 0.0119   Epoch: 13   Global Step: 162590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:07:09,807-Speed 3022.47 samples/sec   Loss 4.0966   LearningRate 0.0119   Epoch: 13   Global Step: 162600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:07:13,117-Speed 3094.99 samples/sec   Loss 4.0918   LearningRate 0.0119   Epoch: 13   Global Step: 162610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:07:16,444-Speed 3077.90 samples/sec   Loss 4.1149   LearningRate 0.0119   Epoch: 13   Global Step: 162620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:07:19,881-Speed 2980.85 samples/sec   Loss 4.1185   LearningRate 0.0119   Epoch: 13   Global Step: 162630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:07:23,283-Speed 3011.41 samples/sec   Loss 4.0499   LearningRate 0.0119   Epoch: 13   Global Step: 162640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:07:26,677-Speed 3017.95 samples/sec   Loss 4.0765   LearningRate 0.0119   Epoch: 13   Global Step: 162650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:07:30,026-Speed 3057.74 samples/sec   Loss 4.0757   LearningRate 0.0119   Epoch: 13   Global Step: 162660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:07:33,414-Speed 3023.40 samples/sec   Loss 3.9936   LearningRate 0.0119   Epoch: 13   Global Step: 162670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:07:36,885-Speed 2951.20 samples/sec   Loss 4.0905   LearningRate 0.0119   Epoch: 13   Global Step: 162680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:07:40,301-Speed 2998.99 samples/sec   Loss 4.0960   LearningRate 0.0119   Epoch: 13   Global Step: 162690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:07:43,646-Speed 3061.36 samples/sec   Loss 4.0205   LearningRate 0.0119   Epoch: 13   Global Step: 162700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:07:47,049-Speed 3010.36 samples/sec   Loss 4.1395   LearningRate 0.0119   Epoch: 13   Global Step: 162710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:07:50,488-Speed 2978.53 samples/sec   Loss 4.0789   LearningRate 0.0119   Epoch: 13   Global Step: 162720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:07:53,865-Speed 3032.91 samples/sec   Loss 4.1465   LearningRate 0.0119   Epoch: 13   Global Step: 162730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:07:57,242-Speed 3033.32 samples/sec   Loss 4.0205   LearningRate 0.0119   Epoch: 13   Global Step: 162740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:08:00,597-Speed 3053.08 samples/sec   Loss 4.1008   LearningRate 0.0119   Epoch: 13   Global Step: 162750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:08:03,913-Speed 3088.20 samples/sec   Loss 4.0959   LearningRate 0.0119   Epoch: 13   Global Step: 162760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:08:07,232-Speed 3086.62 samples/sec   Loss 4.2306   LearningRate 0.0119   Epoch: 13   Global Step: 162770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:08:10,652-Speed 2995.18 samples/sec   Loss 4.0956   LearningRate 0.0119   Epoch: 13   Global Step: 162780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:08:14,104-Speed 2967.02 samples/sec   Loss 4.1297   LearningRate 0.0119   Epoch: 13   Global Step: 162790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:08:17,421-Speed 3087.98 samples/sec   Loss 4.0700   LearningRate 0.0119   Epoch: 13   Global Step: 162800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:08:20,756-Speed 3071.37 samples/sec   Loss 4.1866   LearningRate 0.0119   Epoch: 13   Global Step: 162810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:08:24,124-Speed 3040.80 samples/sec   Loss 4.0368   LearningRate 0.0119   Epoch: 13   Global Step: 162820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:08:27,448-Speed 3083.84 samples/sec   Loss 4.0841   LearningRate 0.0119   Epoch: 13   Global Step: 162830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:08:30,874-Speed 2989.78 samples/sec   Loss 4.0836   LearningRate 0.0119   Epoch: 13   Global Step: 162840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:08:34,287-Speed 3000.58 samples/sec   Loss 4.1906   LearningRate 0.0119   Epoch: 13   Global Step: 162850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:08:37,724-Speed 2980.27 samples/sec   Loss 4.1209   LearningRate 0.0119   Epoch: 13   Global Step: 162860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:08:41,076-Speed 3055.64 samples/sec   Loss 4.1398   LearningRate 0.0119   Epoch: 13   Global Step: 162870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:08:44,455-Speed 3031.21 samples/sec   Loss 4.1798   LearningRate 0.0119   Epoch: 13   Global Step: 162880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:08:47,845-Speed 3022.32 samples/sec   Loss 4.1270   LearningRate 0.0119   Epoch: 13   Global Step: 162890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:08:51,242-Speed 3015.23 samples/sec   Loss 4.0835   LearningRate 0.0119   Epoch: 13   Global Step: 162900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:08:54,631-Speed 3021.57 samples/sec   Loss 4.1918   LearningRate 0.0118   Epoch: 13   Global Step: 162910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:08:58,032-Speed 3011.72 samples/sec   Loss 4.2050   LearningRate 0.0118   Epoch: 13   Global Step: 162920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:09:01,431-Speed 3014.14 samples/sec   Loss 4.2149   LearningRate 0.0118   Epoch: 13   Global Step: 162930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:09:04,775-Speed 3062.31 samples/sec   Loss 4.1610   LearningRate 0.0118   Epoch: 13   Global Step: 162940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:09:08,117-Speed 3065.39 samples/sec   Loss 4.1328   LearningRate 0.0118   Epoch: 13   Global Step: 162950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:09:11,491-Speed 3035.49 samples/sec   Loss 4.1529   LearningRate 0.0118   Epoch: 13   Global Step: 162960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:09:14,818-Speed 3078.46 samples/sec   Loss 4.1972   LearningRate 0.0118   Epoch: 13   Global Step: 162970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:09:18,199-Speed 3029.76 samples/sec   Loss 4.2673   LearningRate 0.0118   Epoch: 13   Global Step: 162980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:09:21,611-Speed 3002.50 samples/sec   Loss 4.1626   LearningRate 0.0118   Epoch: 13   Global Step: 162990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:09:24,997-Speed 3024.71 samples/sec   Loss 4.1632   LearningRate 0.0118   Epoch: 13   Global Step: 163000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:09:28,337-Speed 3066.90 samples/sec   Loss 4.2304   LearningRate 0.0118   Epoch: 13   Global Step: 163010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:09:31,769-Speed 2984.35 samples/sec   Loss 4.1807   LearningRate 0.0118   Epoch: 13   Global Step: 163020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:09:35,123-Speed 3053.75 samples/sec   Loss 4.1825   LearningRate 0.0118   Epoch: 13   Global Step: 163030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:09:38,486-Speed 3045.64 samples/sec   Loss 4.1462   LearningRate 0.0118   Epoch: 13   Global Step: 163040   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:09:41,841-Speed 3053.46 samples/sec   Loss 4.1650   LearningRate 0.0118   Epoch: 13   Global Step: 163050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:09:45,174-Speed 3073.01 samples/sec   Loss 4.2414   LearningRate 0.0118   Epoch: 13   Global Step: 163060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:09:48,504-Speed 3076.18 samples/sec   Loss 4.2414   LearningRate 0.0118   Epoch: 13   Global Step: 163070   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:09:51,897-Speed 3018.66 samples/sec   Loss 4.2509   LearningRate 0.0118   Epoch: 13   Global Step: 163080   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:09:55,235-Speed 3068.12 samples/sec   Loss 4.2716   LearningRate 0.0118   Epoch: 13   Global Step: 163090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:09:58,699-Speed 2957.21 samples/sec   Loss 4.1236   LearningRate 0.0118   Epoch: 13   Global Step: 163100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:10:02,158-Speed 2960.97 samples/sec   Loss 4.1207   LearningRate 0.0118   Epoch: 13   Global Step: 163110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:10:05,537-Speed 3031.91 samples/sec   Loss 4.2433   LearningRate 0.0118   Epoch: 13   Global Step: 163120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:10:08,962-Speed 2990.92 samples/sec   Loss 4.2492   LearningRate 0.0118   Epoch: 13   Global Step: 163130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:10:12,351-Speed 3022.26 samples/sec   Loss 4.2776   LearningRate 0.0118   Epoch: 13   Global Step: 163140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:10:15,666-Speed 3089.64 samples/sec   Loss 4.2430   LearningRate 0.0118   Epoch: 13   Global Step: 163150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:10:19,090-Speed 2992.06 samples/sec   Loss 4.1241   LearningRate 0.0118   Epoch: 13   Global Step: 163160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:10:22,516-Speed 2989.71 samples/sec   Loss 4.1665   LearningRate 0.0118   Epoch: 13   Global Step: 163170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:10:25,903-Speed 3023.80 samples/sec   Loss 4.2282   LearningRate 0.0118   Epoch: 13   Global Step: 163180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:10:29,257-Speed 3053.86 samples/sec   Loss 4.2342   LearningRate 0.0118   Epoch: 13   Global Step: 163190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:10:32,657-Speed 3012.74 samples/sec   Loss 4.1700   LearningRate 0.0118   Epoch: 13   Global Step: 163200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:10:36,068-Speed 3002.86 samples/sec   Loss 4.1656   LearningRate 0.0118   Epoch: 13   Global Step: 163210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:10:39,426-Speed 3049.80 samples/sec   Loss 4.1698   LearningRate 0.0118   Epoch: 13   Global Step: 163220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:10:42,766-Speed 3066.61 samples/sec   Loss 4.2152   LearningRate 0.0118   Epoch: 13   Global Step: 163230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:10:46,100-Speed 3072.56 samples/sec   Loss 4.1760   LearningRate 0.0118   Epoch: 13   Global Step: 163240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:10:49,622-Speed 2908.46 samples/sec   Loss 4.1735   LearningRate 0.0118   Epoch: 13   Global Step: 163250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:10:53,083-Speed 2960.26 samples/sec   Loss 4.1781   LearningRate 0.0118   Epoch: 13   Global Step: 163260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:10:56,463-Speed 3030.64 samples/sec   Loss 4.2397   LearningRate 0.0117   Epoch: 13   Global Step: 163270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:10:59,858-Speed 3016.40 samples/sec   Loss 4.2745   LearningRate 0.0117   Epoch: 13   Global Step: 163280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:11:03,295-Speed 2980.14 samples/sec   Loss 4.1263   LearningRate 0.0117   Epoch: 13   Global Step: 163290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:11:06,690-Speed 3017.39 samples/sec   Loss 4.1905   LearningRate 0.0117   Epoch: 13   Global Step: 163300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:11:10,120-Speed 2986.33 samples/sec   Loss 4.2020   LearningRate 0.0117   Epoch: 13   Global Step: 163310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:11:13,503-Speed 3027.70 samples/sec   Loss 4.1637   LearningRate 0.0117   Epoch: 13   Global Step: 163320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:11:16,925-Speed 2992.65 samples/sec   Loss 4.1308   LearningRate 0.0117   Epoch: 13   Global Step: 163330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:11:20,331-Speed 3007.76 samples/sec   Loss 4.1824   LearningRate 0.0117   Epoch: 13   Global Step: 163340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:11:23,722-Speed 3020.63 samples/sec   Loss 4.2840   LearningRate 0.0117   Epoch: 13   Global Step: 163350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:11:27,084-Speed 3046.67 samples/sec   Loss 4.2807   LearningRate 0.0117   Epoch: 13   Global Step: 163360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:11:30,531-Speed 2971.68 samples/sec   Loss 4.2229   LearningRate 0.0117   Epoch: 13   Global Step: 163370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:11:34,041-Speed 2918.68 samples/sec   Loss 4.3116   LearningRate 0.0117   Epoch: 13   Global Step: 163380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:11:37,515-Speed 2948.95 samples/sec   Loss 4.3037   LearningRate 0.0117   Epoch: 13   Global Step: 163390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:11:40,984-Speed 2952.71 samples/sec   Loss 4.1818   LearningRate 0.0117   Epoch: 13   Global Step: 163400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:11:44,437-Speed 2966.35 samples/sec   Loss 4.2607   LearningRate 0.0117   Epoch: 13   Global Step: 163410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:11:47,912-Speed 2946.88 samples/sec   Loss 4.2461   LearningRate 0.0117   Epoch: 13   Global Step: 163420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:11:51,353-Speed 2977.26 samples/sec   Loss 4.1110   LearningRate 0.0117   Epoch: 13   Global Step: 163430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:11:54,885-Speed 2899.94 samples/sec   Loss 4.2372   LearningRate 0.0117   Epoch: 13   Global Step: 163440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:11:58,348-Speed 2957.94 samples/sec   Loss 4.3073   LearningRate 0.0117   Epoch: 13   Global Step: 163450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:12:01,751-Speed 3009.75 samples/sec   Loss 4.2366   LearningRate 0.0117   Epoch: 13   Global Step: 163460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:05,104-Speed 3055.13 samples/sec   Loss 4.3395   LearningRate 0.0117   Epoch: 13   Global Step: 163470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:08,521-Speed 2997.26 samples/sec   Loss 4.2869   LearningRate 0.0117   Epoch: 13   Global Step: 163480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:11,985-Speed 2956.94 samples/sec   Loss 4.2825   LearningRate 0.0117   Epoch: 13   Global Step: 163490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:15,415-Speed 2986.73 samples/sec   Loss 4.2614   LearningRate 0.0117   Epoch: 13   Global Step: 163500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:18,824-Speed 3004.84 samples/sec   Loss 4.2165   LearningRate 0.0117   Epoch: 13   Global Step: 163510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:22,259-Speed 2981.85 samples/sec   Loss 4.1936   LearningRate 0.0117   Epoch: 13   Global Step: 163520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:25,726-Speed 2954.06 samples/sec   Loss 4.2199   LearningRate 0.0117   Epoch: 13   Global Step: 163530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:29,180-Speed 2965.50 samples/sec   Loss 4.1824   LearningRate 0.0117   Epoch: 13   Global Step: 163540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:32,565-Speed 3026.20 samples/sec   Loss 4.2497   LearningRate 0.0117   Epoch: 13   Global Step: 163550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:35,923-Speed 3049.98 samples/sec   Loss 4.2953   LearningRate 0.0117   Epoch: 13   Global Step: 163560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:12:39,309-Speed 3025.43 samples/sec   Loss 4.1970   LearningRate 0.0117   Epoch: 13   Global Step: 163570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:42,696-Speed 3024.70 samples/sec   Loss 4.2752   LearningRate 0.0117   Epoch: 13   Global Step: 163580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:46,073-Speed 3032.23 samples/sec   Loss 4.2199   LearningRate 0.0117   Epoch: 13   Global Step: 163590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:49,442-Speed 3040.15 samples/sec   Loss 4.3102   LearningRate 0.0117   Epoch: 13   Global Step: 163600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:52,821-Speed 3032.07 samples/sec   Loss 4.2745   LearningRate 0.0117   Epoch: 13   Global Step: 163610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:56,128-Speed 3096.85 samples/sec   Loss 4.2956   LearningRate 0.0117   Epoch: 13   Global Step: 163620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:12:59,461-Speed 3073.97 samples/sec   Loss 4.3057   LearningRate 0.0116   Epoch: 13   Global Step: 163630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:13:02,786-Speed 3080.12 samples/sec   Loss 4.2670   LearningRate 0.0116   Epoch: 13   Global Step: 163640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:13:06,186-Speed 3012.69 samples/sec   Loss 4.2500   LearningRate 0.0116   Epoch: 13   Global Step: 163650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:13:09,571-Speed 3026.22 samples/sec   Loss 4.2155   LearningRate 0.0116   Epoch: 13   Global Step: 163660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:13:13,001-Speed 2985.69 samples/sec   Loss 4.2942   LearningRate 0.0116   Epoch: 13   Global Step: 163670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:13:16,420-Speed 2996.19 samples/sec   Loss 4.2161   LearningRate 0.0116   Epoch: 13   Global Step: 163680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:13:19,857-Speed 2980.74 samples/sec   Loss 4.1931   LearningRate 0.0116   Epoch: 13   Global Step: 163690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:13:23,222-Speed 3044.31 samples/sec   Loss 4.2723   LearningRate 0.0116   Epoch: 13   Global Step: 163700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:13:26,542-Speed 3085.17 samples/sec   Loss 4.3704   LearningRate 0.0116   Epoch: 13   Global Step: 163710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:13:29,882-Speed 3066.92 samples/sec   Loss 4.2590   LearningRate 0.0116   Epoch: 13   Global Step: 163720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:13:33,295-Speed 3001.22 samples/sec   Loss 4.2543   LearningRate 0.0116   Epoch: 13   Global Step: 163730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:13:36,714-Speed 2995.74 samples/sec   Loss 4.3364   LearningRate 0.0116   Epoch: 13   Global Step: 163740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:13:40,106-Speed 3020.05 samples/sec   Loss 4.2609   LearningRate 0.0116   Epoch: 13   Global Step: 163750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:13:43,485-Speed 3031.72 samples/sec   Loss 4.2898   LearningRate 0.0116   Epoch: 13   Global Step: 163760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:13:46,855-Speed 3039.04 samples/sec   Loss 4.2434   LearningRate 0.0116   Epoch: 13   Global Step: 163770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:13:50,188-Speed 3073.72 samples/sec   Loss 4.2293   LearningRate 0.0116   Epoch: 13   Global Step: 163780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:13:53,506-Speed 3087.40 samples/sec   Loss 4.1982   LearningRate 0.0116   Epoch: 13   Global Step: 163790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:13:56,822-Speed 3088.55 samples/sec   Loss 4.3564   LearningRate 0.0116   Epoch: 13   Global Step: 163800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:14:00,165-Speed 3063.88 samples/sec   Loss 4.3031   LearningRate 0.0116   Epoch: 13   Global Step: 163810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:14:03,537-Speed 3037.48 samples/sec   Loss 4.2464   LearningRate 0.0116   Epoch: 13   Global Step: 163820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:14:06,875-Speed 3068.63 samples/sec   Loss 4.3397   LearningRate 0.0116   Epoch: 13   Global Step: 163830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:10,265-Speed 3021.59 samples/sec   Loss 4.2955   LearningRate 0.0116   Epoch: 13   Global Step: 163840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:13,647-Speed 3028.49 samples/sec   Loss 4.2220   LearningRate 0.0116   Epoch: 13   Global Step: 163850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:17,054-Speed 3006.29 samples/sec   Loss 4.4015   LearningRate 0.0116   Epoch: 13   Global Step: 163860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:20,453-Speed 3013.65 samples/sec   Loss 4.2687   LearningRate 0.0116   Epoch: 13   Global Step: 163870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:23,777-Speed 3081.20 samples/sec   Loss 4.2637   LearningRate 0.0116   Epoch: 13   Global Step: 163880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:27,124-Speed 3060.91 samples/sec   Loss 4.2315   LearningRate 0.0116   Epoch: 13   Global Step: 163890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:30,510-Speed 3025.32 samples/sec   Loss 4.3162   LearningRate 0.0116   Epoch: 13   Global Step: 163900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:33,934-Speed 2991.32 samples/sec   Loss 4.1595   LearningRate 0.0116   Epoch: 13   Global Step: 163910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:37,388-Speed 2965.60 samples/sec   Loss 4.2680   LearningRate 0.0116   Epoch: 13   Global Step: 163920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:40,752-Speed 3045.24 samples/sec   Loss 4.2094   LearningRate 0.0116   Epoch: 13   Global Step: 163930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:44,164-Speed 3001.86 samples/sec   Loss 4.2980   LearningRate 0.0116   Epoch: 13   Global Step: 163940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:47,566-Speed 3011.09 samples/sec   Loss 4.3257   LearningRate 0.0116   Epoch: 13   Global Step: 163950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:50,962-Speed 3016.29 samples/sec   Loss 4.3764   LearningRate 0.0116   Epoch: 13   Global Step: 163960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:54,363-Speed 3011.04 samples/sec   Loss 4.3225   LearningRate 0.0116   Epoch: 13   Global Step: 163970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:14:57,755-Speed 3019.57 samples/sec   Loss 4.2844   LearningRate 0.0116   Epoch: 13   Global Step: 163980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:01,068-Speed 3091.92 samples/sec   Loss 4.3072   LearningRate 0.0116   Epoch: 13   Global Step: 163990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:04,469-Speed 3011.75 samples/sec   Loss 4.3493   LearningRate 0.0115   Epoch: 13   Global Step: 164000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:07,872-Speed 3010.14 samples/sec   Loss 4.3445   LearningRate 0.0115   Epoch: 13   Global Step: 164010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:11,274-Speed 3010.67 samples/sec   Loss 4.3105   LearningRate 0.0115   Epoch: 13   Global Step: 164020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:14,642-Speed 3041.66 samples/sec   Loss 4.1768   LearningRate 0.0115   Epoch: 13   Global Step: 164030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:18,095-Speed 2966.41 samples/sec   Loss 4.2028   LearningRate 0.0115   Epoch: 13   Global Step: 164040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:21,482-Speed 3023.70 samples/sec   Loss 4.3207   LearningRate 0.0115   Epoch: 13   Global Step: 164050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:24,921-Speed 2978.77 samples/sec   Loss 4.3715   LearningRate 0.0115   Epoch: 13   Global Step: 164060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:28,270-Speed 3059.13 samples/sec   Loss 4.3491   LearningRate 0.0115   Epoch: 13   Global Step: 164070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:31,690-Speed 2994.32 samples/sec   Loss 4.2816   LearningRate 0.0115   Epoch: 13   Global Step: 164080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:35,090-Speed 3013.02 samples/sec   Loss 4.3811   LearningRate 0.0115   Epoch: 13   Global Step: 164090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:38,508-Speed 2996.77 samples/sec   Loss 4.3362   LearningRate 0.0115   Epoch: 13   Global Step: 164100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:41,836-Speed 3078.26 samples/sec   Loss 4.3110   LearningRate 0.0115   Epoch: 13   Global Step: 164110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:45,182-Speed 3061.16 samples/sec   Loss 4.3369   LearningRate 0.0115   Epoch: 13   Global Step: 164120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:15:48,638-Speed 2963.62 samples/sec   Loss 4.4086   LearningRate 0.0115   Epoch: 13   Global Step: 164130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:15:52,046-Speed 3005.66 samples/sec   Loss 4.3347   LearningRate 0.0115   Epoch: 13   Global Step: 164140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:15:55,448-Speed 3010.96 samples/sec   Loss 4.3856   LearningRate 0.0115   Epoch: 13   Global Step: 164150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:15:58,776-Speed 3077.91 samples/sec   Loss 4.3212   LearningRate 0.0115   Epoch: 13   Global Step: 164160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:16:02,145-Speed 3040.61 samples/sec   Loss 4.3376   LearningRate 0.0115   Epoch: 13   Global Step: 164170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:16:05,633-Speed 2936.70 samples/sec   Loss 4.3637   LearningRate 0.0115   Epoch: 13   Global Step: 164180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:16:09,093-Speed 2960.51 samples/sec   Loss 4.3752   LearningRate 0.0115   Epoch: 13   Global Step: 164190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:16:12,468-Speed 3035.32 samples/sec   Loss 4.4229   LearningRate 0.0115   Epoch: 13   Global Step: 164200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:16:15,818-Speed 3056.90 samples/sec   Loss 4.2849   LearningRate 0.0115   Epoch: 13   Global Step: 164210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:16:19,237-Speed 2996.47 samples/sec   Loss 4.3500   LearningRate 0.0115   Epoch: 13   Global Step: 164220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:16:22,606-Speed 3039.67 samples/sec   Loss 4.4081   LearningRate 0.0115   Epoch: 13   Global Step: 164230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-27 17:16:25,972-Speed 3042.77 samples/sec   Loss 4.4600   LearningRate 0.0115   Epoch: 13   Global Step: 164240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:16:29,368-Speed 3016.27 samples/sec   Loss 4.3536   LearningRate 0.0115   Epoch: 13   Global Step: 164250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:16:32,818-Speed 2969.62 samples/sec   Loss 4.2708   LearningRate 0.0115   Epoch: 13   Global Step: 164260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:16:36,236-Speed 2996.13 samples/sec   Loss 4.3641   LearningRate 0.0115   Epoch: 13   Global Step: 164270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:16:39,586-Speed 3057.54 samples/sec   Loss 4.2887   LearningRate 0.0115   Epoch: 13   Global Step: 164280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:16:42,922-Speed 3070.09 samples/sec   Loss 4.4306   LearningRate 0.0115   Epoch: 13   Global Step: 164290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:16:46,302-Speed 3032.10 samples/sec   Loss 4.3775   LearningRate 0.0115   Epoch: 13   Global Step: 164300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:16:49,706-Speed 3008.67 samples/sec   Loss 4.3794   LearningRate 0.0115   Epoch: 13   Global Step: 164310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:16:53,147-Speed 2976.66 samples/sec   Loss 4.3573   LearningRate 0.0115   Epoch: 13   Global Step: 164320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:16:56,586-Speed 2978.50 samples/sec   Loss 4.4134   LearningRate 0.0115   Epoch: 13   Global Step: 164330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:17:00,079-Speed 2932.85 samples/sec   Loss 4.3999   LearningRate 0.0115   Epoch: 13   Global Step: 164340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:17:03,493-Speed 3000.02 samples/sec   Loss 4.3671   LearningRate 0.0115   Epoch: 13   Global Step: 164350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:17:07,002-Speed 2918.96 samples/sec   Loss 4.3317   LearningRate 0.0115   Epoch: 13   Global Step: 164360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:17:10,403-Speed 3012.27 samples/sec   Loss 4.3332   LearningRate 0.0114   Epoch: 13   Global Step: 164370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:17:13,811-Speed 3005.46 samples/sec   Loss 4.3725   LearningRate 0.0114   Epoch: 13   Global Step: 164380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:17:17,185-Speed 3035.28 samples/sec   Loss 4.2982   LearningRate 0.0114   Epoch: 13   Global Step: 164390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:17:20,609-Speed 2991.86 samples/sec   Loss 4.3257   LearningRate 0.0114   Epoch: 13   Global Step: 164400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:17:23,971-Speed 3046.28 samples/sec   Loss 4.3238   LearningRate 0.0114   Epoch: 13   Global Step: 164410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:17:27,335-Speed 3044.98 samples/sec   Loss 4.3797   LearningRate 0.0114   Epoch: 13   Global Step: 164420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:17:30,721-Speed 3025.53 samples/sec   Loss 4.3365   LearningRate 0.0114   Epoch: 13   Global Step: 164430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:17:34,145-Speed 2991.52 samples/sec   Loss 4.3040   LearningRate 0.0114   Epoch: 13   Global Step: 164440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:17:37,554-Speed 3004.00 samples/sec   Loss 4.3608   LearningRate 0.0114   Epoch: 13   Global Step: 164450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:17:40,959-Speed 3008.51 samples/sec   Loss 4.2930   LearningRate 0.0114   Epoch: 13   Global Step: 164460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:17:44,405-Speed 2971.94 samples/sec   Loss 4.4680   LearningRate 0.0114   Epoch: 13   Global Step: 164470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:17:47,820-Speed 2999.82 samples/sec   Loss 4.3807   LearningRate 0.0114   Epoch: 13   Global Step: 164480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:17:51,247-Speed 2988.98 samples/sec   Loss 4.4476   LearningRate 0.0114   Epoch: 13   Global Step: 164490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:17:54,565-Speed 3086.97 samples/sec   Loss 4.3018   LearningRate 0.0114   Epoch: 13   Global Step: 164500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:17:57,944-Speed 3031.12 samples/sec   Loss 4.3494   LearningRate 0.0114   Epoch: 13   Global Step: 164510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:18:01,326-Speed 3029.47 samples/sec   Loss 4.3451   LearningRate 0.0114   Epoch: 13   Global Step: 164520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:18:04,722-Speed 3015.24 samples/sec   Loss 4.3896   LearningRate 0.0114   Epoch: 13   Global Step: 164530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:18:08,130-Speed 3005.68 samples/sec   Loss 4.3816   LearningRate 0.0114   Epoch: 13   Global Step: 164540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:18:11,468-Speed 3068.74 samples/sec   Loss 4.2773   LearningRate 0.0114   Epoch: 13   Global Step: 164550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:18:14,818-Speed 3057.61 samples/sec   Loss 4.3574   LearningRate 0.0114   Epoch: 13   Global Step: 164560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:18:18,219-Speed 3012.21 samples/sec   Loss 4.4007   LearningRate 0.0114   Epoch: 13   Global Step: 164570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:18:21,574-Speed 3052.36 samples/sec   Loss 4.4411   LearningRate 0.0114   Epoch: 13   Global Step: 164580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:18:24,960-Speed 3025.34 samples/sec   Loss 4.3813   LearningRate 0.0114   Epoch: 13   Global Step: 164590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:18:28,320-Speed 3048.54 samples/sec   Loss 4.2915   LearningRate 0.0114   Epoch: 13   Global Step: 164600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:18:31,709-Speed 3022.10 samples/sec   Loss 4.4771   LearningRate 0.0114   Epoch: 13   Global Step: 164610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:18:35,103-Speed 3018.18 samples/sec   Loss 4.3611   LearningRate 0.0114   Epoch: 13   Global Step: 164620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:18:38,521-Speed 2996.62 samples/sec   Loss 4.4147   LearningRate 0.0114   Epoch: 13   Global Step: 164630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:18:41,957-Speed 2981.04 samples/sec   Loss 4.3568   LearningRate 0.0114   Epoch: 13   Global Step: 164640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:18:45,291-Speed 3072.87 samples/sec   Loss 4.4081   LearningRate 0.0114   Epoch: 13   Global Step: 164650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:18:48,671-Speed 3030.86 samples/sec   Loss 4.4164   LearningRate 0.0114   Epoch: 13   Global Step: 164660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:18:51,996-Speed 3080.48 samples/sec   Loss 4.4411   LearningRate 0.0114   Epoch: 13   Global Step: 164670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:18:55,341-Speed 3062.48 samples/sec   Loss 4.3385   LearningRate 0.0114   Epoch: 13   Global Step: 164680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:18:58,747-Speed 3007.57 samples/sec   Loss 4.4723   LearningRate 0.0114   Epoch: 13   Global Step: 164690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:19:02,208-Speed 2959.10 samples/sec   Loss 4.3517   LearningRate 0.0114   Epoch: 13   Global Step: 164700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:19:05,616-Speed 3005.27 samples/sec   Loss 4.3725   LearningRate 0.0114   Epoch: 13   Global Step: 164710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:19:09,006-Speed 3021.47 samples/sec   Loss 4.3967   LearningRate 0.0114   Epoch: 13   Global Step: 164720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:19:12,415-Speed 3004.53 samples/sec   Loss 4.3934   LearningRate 0.0113   Epoch: 13   Global Step: 164730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:19:15,771-Speed 3052.37 samples/sec   Loss 4.4589   LearningRate 0.0113   Epoch: 13   Global Step: 164740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:19:19,240-Speed 2952.29 samples/sec   Loss 4.3480   LearningRate 0.0113   Epoch: 13   Global Step: 164750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:19:22,690-Speed 2968.90 samples/sec   Loss 4.4787   LearningRate 0.0113   Epoch: 13   Global Step: 164760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:19:26,182-Speed 2933.63 samples/sec   Loss 4.3771   LearningRate 0.0113   Epoch: 13   Global Step: 164770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:19:29,633-Speed 2967.83 samples/sec   Loss 4.4415   LearningRate 0.0113   Epoch: 13   Global Step: 164780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:19:32,970-Speed 3069.84 samples/sec   Loss 4.3787   LearningRate 0.0113   Epoch: 13   Global Step: 164790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:19:36,363-Speed 3018.11 samples/sec   Loss 4.3883   LearningRate 0.0113   Epoch: 13   Global Step: 164800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:19:39,826-Speed 2958.04 samples/sec   Loss 4.3813   LearningRate 0.0113   Epoch: 13   Global Step: 164810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:19:43,207-Speed 3029.73 samples/sec   Loss 4.3349   LearningRate 0.0113   Epoch: 13   Global Step: 164820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:19:46,541-Speed 3071.53 samples/sec   Loss 4.4412   LearningRate 0.0113   Epoch: 13   Global Step: 164830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:19:49,924-Speed 3028.71 samples/sec   Loss 4.4064   LearningRate 0.0113   Epoch: 13   Global Step: 164840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:19:53,301-Speed 3033.41 samples/sec   Loss 4.4089   LearningRate 0.0113   Epoch: 13   Global Step: 164850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:19:56,722-Speed 2994.17 samples/sec   Loss 4.3675   LearningRate 0.0113   Epoch: 13   Global Step: 164860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:00,187-Speed 2955.83 samples/sec   Loss 4.5026   LearningRate 0.0113   Epoch: 13   Global Step: 164870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:03,607-Speed 2994.73 samples/sec   Loss 4.5000   LearningRate 0.0113   Epoch: 13   Global Step: 164880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:06,923-Speed 3089.17 samples/sec   Loss 4.4477   LearningRate 0.0113   Epoch: 13   Global Step: 164890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:10,255-Speed 3074.64 samples/sec   Loss 4.4714   LearningRate 0.0113   Epoch: 13   Global Step: 164900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:13,645-Speed 3020.96 samples/sec   Loss 4.3541   LearningRate 0.0113   Epoch: 13   Global Step: 164910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:17,006-Speed 3047.54 samples/sec   Loss 4.3124   LearningRate 0.0113   Epoch: 13   Global Step: 164920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:20,421-Speed 2999.38 samples/sec   Loss 4.4268   LearningRate 0.0113   Epoch: 13   Global Step: 164930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:23,832-Speed 3003.05 samples/sec   Loss 4.3905   LearningRate 0.0113   Epoch: 13   Global Step: 164940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:27,266-Speed 2983.04 samples/sec   Loss 4.4960   LearningRate 0.0113   Epoch: 13   Global Step: 164950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:30,732-Speed 2956.10 samples/sec   Loss 4.3773   LearningRate 0.0113   Epoch: 13   Global Step: 164960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:20:34,109-Speed 3032.46 samples/sec   Loss 4.3944   LearningRate 0.0113   Epoch: 13   Global Step: 164970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:37,505-Speed 3016.02 samples/sec   Loss 4.3915   LearningRate 0.0113   Epoch: 13   Global Step: 164980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:40,935-Speed 2986.71 samples/sec   Loss 4.4531   LearningRate 0.0113   Epoch: 13   Global Step: 164990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:44,390-Speed 2964.70 samples/sec   Loss 4.4381   LearningRate 0.0113   Epoch: 13   Global Step: 165000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:47,823-Speed 2983.62 samples/sec   Loss 4.4451   LearningRate 0.0113   Epoch: 13   Global Step: 165010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:51,217-Speed 3019.00 samples/sec   Loss 4.4852   LearningRate 0.0113   Epoch: 13   Global Step: 165020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:54,569-Speed 3055.73 samples/sec   Loss 4.4039   LearningRate 0.0113   Epoch: 13   Global Step: 165030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:20:57,996-Speed 2988.59 samples/sec   Loss 4.3498   LearningRate 0.0113   Epoch: 13   Global Step: 165040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:21:01,382-Speed 3024.96 samples/sec   Loss 4.4421   LearningRate 0.0113   Epoch: 13   Global Step: 165050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:21:04,821-Speed 2978.58 samples/sec   Loss 4.3648   LearningRate 0.0113   Epoch: 13   Global Step: 165060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:21:08,211-Speed 3021.24 samples/sec   Loss 4.4519   LearningRate 0.0113   Epoch: 13   Global Step: 165070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:21:11,569-Speed 3049.87 samples/sec   Loss 4.3968   LearningRate 0.0113   Epoch: 13   Global Step: 165080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:21:14,948-Speed 3031.58 samples/sec   Loss 4.4103   LearningRate 0.0113   Epoch: 13   Global Step: 165090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:21:18,391-Speed 2974.86 samples/sec   Loss 4.5568   LearningRate 0.0112   Epoch: 13   Global Step: 165100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:21:21,774-Speed 3027.86 samples/sec   Loss 4.4500   LearningRate 0.0112   Epoch: 13   Global Step: 165110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:21:25,199-Speed 2991.04 samples/sec   Loss 4.4264   LearningRate 0.0112   Epoch: 13   Global Step: 165120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:21:28,635-Speed 2980.12 samples/sec   Loss 4.5038   LearningRate 0.0112   Epoch: 13   Global Step: 165130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:21:32,091-Speed 2964.38 samples/sec   Loss 4.4140   LearningRate 0.0112   Epoch: 13   Global Step: 165140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:21:35,513-Speed 2993.05 samples/sec   Loss 4.4362   LearningRate 0.0112   Epoch: 13   Global Step: 165150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:21:38,931-Speed 2997.23 samples/sec   Loss 4.4282   LearningRate 0.0112   Epoch: 13   Global Step: 165160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:21:42,344-Speed 3001.67 samples/sec   Loss 4.4370   LearningRate 0.0112   Epoch: 13   Global Step: 165170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:21:45,707-Speed 3045.76 samples/sec   Loss 4.4298   LearningRate 0.0112   Epoch: 13   Global Step: 165180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:21:49,111-Speed 3009.00 samples/sec   Loss 4.4594   LearningRate 0.0112   Epoch: 13   Global Step: 165190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:21:52,514-Speed 3009.48 samples/sec   Loss 4.5008   LearningRate 0.0112   Epoch: 13   Global Step: 165200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:21:55,943-Speed 2987.48 samples/sec   Loss 4.4078   LearningRate 0.0112   Epoch: 13   Global Step: 165210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:21:59,343-Speed 3012.52 samples/sec   Loss 4.4245   LearningRate 0.0112   Epoch: 13   Global Step: 165220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:22:02,723-Speed 3029.95 samples/sec   Loss 4.5008   LearningRate 0.0112   Epoch: 13   Global Step: 165230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:22:06,105-Speed 3028.91 samples/sec   Loss 4.4079   LearningRate 0.0112   Epoch: 13   Global Step: 165240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:22:09,474-Speed 3040.38 samples/sec   Loss 4.4697   LearningRate 0.0112   Epoch: 13   Global Step: 165250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:22:12,871-Speed 3015.60 samples/sec   Loss 4.4784   LearningRate 0.0112   Epoch: 13   Global Step: 165260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:22:16,415-Speed 2889.92 samples/sec   Loss 4.5681   LearningRate 0.0112   Epoch: 13   Global Step: 165270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:22:19,832-Speed 2997.97 samples/sec   Loss 4.4690   LearningRate 0.0112   Epoch: 13   Global Step: 165280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:22:23,253-Speed 2993.58 samples/sec   Loss 4.4665   LearningRate 0.0112   Epoch: 13   Global Step: 165290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:22:26,649-Speed 3016.71 samples/sec   Loss 4.3683   LearningRate 0.0112   Epoch: 13   Global Step: 165300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:22:30,091-Speed 2975.47 samples/sec   Loss 4.4937   LearningRate 0.0112   Epoch: 13   Global Step: 165310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:22:33,533-Speed 2975.78 samples/sec   Loss 4.4876   LearningRate 0.0112   Epoch: 13   Global Step: 165320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:22:36,963-Speed 2986.67 samples/sec   Loss 4.4351   LearningRate 0.0112   Epoch: 13   Global Step: 165330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:22:40,410-Speed 2971.95 samples/sec   Loss 4.5057   LearningRate 0.0112   Epoch: 13   Global Step: 165340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:22:43,781-Speed 3038.05 samples/sec   Loss 4.4205   LearningRate 0.0112   Epoch: 13   Global Step: 165350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:22:47,110-Speed 3076.75 samples/sec   Loss 4.5855   LearningRate 0.0112   Epoch: 13   Global Step: 165360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:22:50,501-Speed 3020.94 samples/sec   Loss 4.5284   LearningRate 0.0112   Epoch: 13   Global Step: 165370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:22:53,916-Speed 2999.07 samples/sec   Loss 4.3956   LearningRate 0.0112   Epoch: 13   Global Step: 165380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:22:57,273-Speed 3051.76 samples/sec   Loss 4.5448   LearningRate 0.0112   Epoch: 13   Global Step: 165390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:23:00,607-Speed 3072.85 samples/sec   Loss 4.4804   LearningRate 0.0112   Epoch: 13   Global Step: 165400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:23:04,051-Speed 2973.94 samples/sec   Loss 4.3551   LearningRate 0.0112   Epoch: 13   Global Step: 165410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:23:07,392-Speed 3066.13 samples/sec   Loss 4.4592   LearningRate 0.0112   Epoch: 13   Global Step: 165420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:23:10,746-Speed 3054.17 samples/sec   Loss 4.4900   LearningRate 0.0112   Epoch: 13   Global Step: 165430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:23:14,083-Speed 3069.79 samples/sec   Loss 4.4976   LearningRate 0.0112   Epoch: 13   Global Step: 165440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:23:17,476-Speed 3019.59 samples/sec   Loss 4.4201   LearningRate 0.0112   Epoch: 13   Global Step: 165450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:23:20,869-Speed 3018.88 samples/sec   Loss 4.4798   LearningRate 0.0112   Epoch: 13   Global Step: 165460   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:23:24,263-Speed 3017.55 samples/sec   Loss 4.4398   LearningRate 0.0111   Epoch: 13   Global Step: 165470   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:23:27,733-Speed 2951.12 samples/sec   Loss 4.5107   LearningRate 0.0111   Epoch: 13   Global Step: 165480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:23:31,166-Speed 2983.83 samples/sec   Loss 4.4392   LearningRate 0.0111   Epoch: 13   Global Step: 165490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:23:34,544-Speed 3032.78 samples/sec   Loss 4.5931   LearningRate 0.0111   Epoch: 13   Global Step: 165500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:23:37,916-Speed 3036.81 samples/sec   Loss 4.5270   LearningRate 0.0111   Epoch: 13   Global Step: 165510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:23:41,268-Speed 3056.05 samples/sec   Loss 4.5672   LearningRate 0.0111   Epoch: 13   Global Step: 165520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:23:44,721-Speed 2966.93 samples/sec   Loss 4.5239   LearningRate 0.0111   Epoch: 13   Global Step: 165530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:23:48,127-Speed 3006.51 samples/sec   Loss 4.4127   LearningRate 0.0111   Epoch: 13   Global Step: 165540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:23:51,550-Speed 2992.64 samples/sec   Loss 4.4684   LearningRate 0.0111   Epoch: 13   Global Step: 165550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:23:54,957-Speed 3006.08 samples/sec   Loss 4.5290   LearningRate 0.0111   Epoch: 13   Global Step: 165560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:23:58,357-Speed 3013.19 samples/sec   Loss 4.5029   LearningRate 0.0111   Epoch: 13   Global Step: 165570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:01,696-Speed 3067.46 samples/sec   Loss 4.4068   LearningRate 0.0111   Epoch: 13   Global Step: 165580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:05,089-Speed 3019.00 samples/sec   Loss 4.4554   LearningRate 0.0111   Epoch: 13   Global Step: 165590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:08,461-Speed 3037.37 samples/sec   Loss 4.4883   LearningRate 0.0111   Epoch: 13   Global Step: 165600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:11,859-Speed 3014.39 samples/sec   Loss 4.4863   LearningRate 0.0111   Epoch: 13   Global Step: 165610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:15,238-Speed 3031.43 samples/sec   Loss 4.5023   LearningRate 0.0111   Epoch: 13   Global Step: 165620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:24:18,725-Speed 2937.54 samples/sec   Loss 4.5474   LearningRate 0.0111   Epoch: 13   Global Step: 165630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:24:22,066-Speed 3065.54 samples/sec   Loss 4.4988   LearningRate 0.0111   Epoch: 13   Global Step: 165640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:24:25,476-Speed 3003.50 samples/sec   Loss 4.6042   LearningRate 0.0111   Epoch: 13   Global Step: 165650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:28,800-Speed 3081.46 samples/sec   Loss 4.4653   LearningRate 0.0111   Epoch: 13   Global Step: 165660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:32,207-Speed 3007.12 samples/sec   Loss 4.4592   LearningRate 0.0111   Epoch: 13   Global Step: 165670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:35,578-Speed 3038.08 samples/sec   Loss 4.4958   LearningRate 0.0111   Epoch: 13   Global Step: 165680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:38,897-Speed 3085.76 samples/sec   Loss 4.4405   LearningRate 0.0111   Epoch: 13   Global Step: 165690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:42,209-Speed 3092.73 samples/sec   Loss 4.5233   LearningRate 0.0111   Epoch: 13   Global Step: 165700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:45,604-Speed 3017.23 samples/sec   Loss 4.4604   LearningRate 0.0111   Epoch: 13   Global Step: 165710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:49,068-Speed 2957.20 samples/sec   Loss 4.4502   LearningRate 0.0111   Epoch: 13   Global Step: 165720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:52,483-Speed 2999.44 samples/sec   Loss 4.5466   LearningRate 0.0111   Epoch: 13   Global Step: 165730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:55,901-Speed 2996.57 samples/sec   Loss 4.5944   LearningRate 0.0111   Epoch: 13   Global Step: 165740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:24:59,267-Speed 3042.96 samples/sec   Loss 4.4831   LearningRate 0.0111   Epoch: 13   Global Step: 165750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:25:02,654-Speed 3023.75 samples/sec   Loss 4.6075   LearningRate 0.0111   Epoch: 13   Global Step: 165760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:25:06,065-Speed 3003.24 samples/sec   Loss 4.4626   LearningRate 0.0111   Epoch: 13   Global Step: 165770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:25:09,466-Speed 3012.02 samples/sec   Loss 4.5134   LearningRate 0.0111   Epoch: 13   Global Step: 165780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:25:12,912-Speed 2972.04 samples/sec   Loss 4.4895   LearningRate 0.0111   Epoch: 13   Global Step: 165790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:25:16,342-Speed 2987.08 samples/sec   Loss 4.4423   LearningRate 0.0111   Epoch: 13   Global Step: 165800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:25:19,739-Speed 3015.47 samples/sec   Loss 4.5215   LearningRate 0.0111   Epoch: 13   Global Step: 165810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:25:23,140-Speed 3012.21 samples/sec   Loss 4.4326   LearningRate 0.0111   Epoch: 13   Global Step: 165820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:25:26,492-Speed 3055.45 samples/sec   Loss 4.6215   LearningRate 0.0111   Epoch: 13   Global Step: 165830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:25:29,879-Speed 3024.57 samples/sec   Loss 4.5065   LearningRate 0.0111   Epoch: 13   Global Step: 165840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:25:33,273-Speed 3018.22 samples/sec   Loss 4.5229   LearningRate 0.0110   Epoch: 13   Global Step: 165850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:25:36,762-Speed 2935.88 samples/sec   Loss 4.4156   LearningRate 0.0110   Epoch: 13   Global Step: 165860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:25:40,154-Speed 3019.19 samples/sec   Loss 4.6155   LearningRate 0.0110   Epoch: 13   Global Step: 165870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:25:43,511-Speed 3051.47 samples/sec   Loss 4.4358   LearningRate 0.0110   Epoch: 13   Global Step: 165880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:25:46,887-Speed 3033.64 samples/sec   Loss 4.4887   LearningRate 0.0110   Epoch: 13   Global Step: 165890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:25:50,244-Speed 3051.60 samples/sec   Loss 4.5458   LearningRate 0.0110   Epoch: 13   Global Step: 165900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:25:53,755-Speed 2917.16 samples/sec   Loss 4.5159   LearningRate 0.0110   Epoch: 13   Global Step: 165910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:25:57,107-Speed 3055.83 samples/sec   Loss 4.4888   LearningRate 0.0110   Epoch: 13   Global Step: 165920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:00,458-Speed 3057.03 samples/sec   Loss 4.5196   LearningRate 0.0110   Epoch: 13   Global Step: 165930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:03,783-Speed 3080.48 samples/sec   Loss 4.5546   LearningRate 0.0110   Epoch: 13   Global Step: 165940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:07,126-Speed 3064.20 samples/sec   Loss 4.5065   LearningRate 0.0110   Epoch: 13   Global Step: 165950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:10,488-Speed 3046.33 samples/sec   Loss 4.5487   LearningRate 0.0110   Epoch: 13   Global Step: 165960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:13,884-Speed 3016.35 samples/sec   Loss 4.4583   LearningRate 0.0110   Epoch: 13   Global Step: 165970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:17,334-Speed 2968.77 samples/sec   Loss 4.5932   LearningRate 0.0110   Epoch: 13   Global Step: 165980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:20,751-Speed 2997.77 samples/sec   Loss 4.5661   LearningRate 0.0110   Epoch: 13   Global Step: 165990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:26:24,130-Speed 3031.61 samples/sec   Loss 4.5338   LearningRate 0.0110   Epoch: 13   Global Step: 166000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:26:27,500-Speed 3039.68 samples/sec   Loss 4.5264   LearningRate 0.0110   Epoch: 13   Global Step: 166010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:30,879-Speed 3031.40 samples/sec   Loss 4.5138   LearningRate 0.0110   Epoch: 13   Global Step: 166020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:34,326-Speed 2971.42 samples/sec   Loss 4.5849   LearningRate 0.0110   Epoch: 13   Global Step: 166030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:37,717-Speed 3020.08 samples/sec   Loss 4.5624   LearningRate 0.0110   Epoch: 13   Global Step: 166040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:41,060-Speed 3064.63 samples/sec   Loss 4.4967   LearningRate 0.0110   Epoch: 13   Global Step: 166050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:44,433-Speed 3036.10 samples/sec   Loss 4.3855   LearningRate 0.0110   Epoch: 13   Global Step: 166060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:47,798-Speed 3044.57 samples/sec   Loss 4.6836   LearningRate 0.0110   Epoch: 13   Global Step: 166070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:51,287-Speed 2935.64 samples/sec   Loss 4.5072   LearningRate 0.0110   Epoch: 13   Global Step: 166080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:54,757-Speed 2951.95 samples/sec   Loss 4.6019   LearningRate 0.0110   Epoch: 13   Global Step: 166090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:26:58,203-Speed 2971.86 samples/sec   Loss 4.4947   LearningRate 0.0110   Epoch: 13   Global Step: 166100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:01,592-Speed 3022.89 samples/sec   Loss 4.5380   LearningRate 0.0110   Epoch: 13   Global Step: 166110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:27:04,937-Speed 3062.84 samples/sec   Loss 4.4379   LearningRate 0.0110   Epoch: 13   Global Step: 166120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:08,310-Speed 3036.29 samples/sec   Loss 4.5667   LearningRate 0.0110   Epoch: 13   Global Step: 166130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:11,782-Speed 2949.45 samples/sec   Loss 4.5812   LearningRate 0.0110   Epoch: 13   Global Step: 166140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:15,134-Speed 3057.20 samples/sec   Loss 4.4899   LearningRate 0.0110   Epoch: 13   Global Step: 166150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:18,487-Speed 3054.29 samples/sec   Loss 4.5247   LearningRate 0.0110   Epoch: 13   Global Step: 166160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:21,887-Speed 3012.73 samples/sec   Loss 4.5967   LearningRate 0.0110   Epoch: 13   Global Step: 166170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:25,228-Speed 3066.24 samples/sec   Loss 4.4822   LearningRate 0.0110   Epoch: 13   Global Step: 166180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:28,561-Speed 3072.51 samples/sec   Loss 4.5482   LearningRate 0.0110   Epoch: 13   Global Step: 166190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:31,957-Speed 3016.61 samples/sec   Loss 4.4725   LearningRate 0.0110   Epoch: 13   Global Step: 166200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:35,357-Speed 3012.47 samples/sec   Loss 4.4883   LearningRate 0.0110   Epoch: 13   Global Step: 166210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:38,794-Speed 2979.94 samples/sec   Loss 4.4964   LearningRate 0.0109   Epoch: 13   Global Step: 166220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:42,183-Speed 3022.82 samples/sec   Loss 4.5076   LearningRate 0.0109   Epoch: 13   Global Step: 166230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:45,618-Speed 2982.09 samples/sec   Loss 4.4489   LearningRate 0.0109   Epoch: 13   Global Step: 166240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:49,049-Speed 2985.69 samples/sec   Loss 4.4462   LearningRate 0.0109   Epoch: 13   Global Step: 166250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:52,462-Speed 3001.24 samples/sec   Loss 4.5575   LearningRate 0.0109   Epoch: 13   Global Step: 166260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:55,897-Speed 2981.56 samples/sec   Loss 4.5595   LearningRate 0.0109   Epoch: 13   Global Step: 166270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:27:59,309-Speed 3002.58 samples/sec   Loss 4.4487   LearningRate 0.0109   Epoch: 13   Global Step: 166280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:28:02,771-Speed 2958.30 samples/sec   Loss 4.5220   LearningRate 0.0109   Epoch: 13   Global Step: 166290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:28:06,258-Speed 2937.20 samples/sec   Loss 4.5308   LearningRate 0.0109   Epoch: 13   Global Step: 166300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:28:09,675-Speed 2997.71 samples/sec   Loss 4.5160   LearningRate 0.0109   Epoch: 13   Global Step: 166310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:28:13,040-Speed 3044.47 samples/sec   Loss 4.4554   LearningRate 0.0109   Epoch: 13   Global Step: 166320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:28:16,424-Speed 3026.97 samples/sec   Loss 4.4003   LearningRate 0.0109   Epoch: 13   Global Step: 166330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:28:19,819-Speed 3016.89 samples/sec   Loss 4.6344   LearningRate 0.0109   Epoch: 13   Global Step: 166340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:28:23,216-Speed 3015.62 samples/sec   Loss 4.4538   LearningRate 0.0109   Epoch: 13   Global Step: 166350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:28:26,572-Speed 3051.25 samples/sec   Loss 4.5815   LearningRate 0.0109   Epoch: 13   Global Step: 166360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:28:29,929-Speed 3051.06 samples/sec   Loss 4.6138   LearningRate 0.0109   Epoch: 13   Global Step: 166370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:28:33,307-Speed 3032.23 samples/sec   Loss 4.4975   LearningRate 0.0109   Epoch: 13   Global Step: 166380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:28:36,700-Speed 3019.87 samples/sec   Loss 4.6036   LearningRate 0.0109   Epoch: 13   Global Step: 166390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:28:40,038-Speed 3068.62 samples/sec   Loss 4.6129   LearningRate 0.0109   Epoch: 13   Global Step: 166400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:28:43,450-Speed 3001.39 samples/sec   Loss 4.6060   LearningRate 0.0109   Epoch: 13   Global Step: 166410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:28:46,828-Speed 3032.62 samples/sec   Loss 4.6040   LearningRate 0.0109   Epoch: 13   Global Step: 166420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:28:50,170-Speed 3064.61 samples/sec   Loss 4.5862   LearningRate 0.0109   Epoch: 13   Global Step: 166430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:28:53,643-Speed 2949.28 samples/sec   Loss 4.5981   LearningRate 0.0109   Epoch: 13   Global Step: 166440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:28:57,035-Speed 3019.64 samples/sec   Loss 4.5560   LearningRate 0.0109   Epoch: 13   Global Step: 166450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:29:00,386-Speed 3056.41 samples/sec   Loss 4.5280   LearningRate 0.0109   Epoch: 13   Global Step: 166460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:29:03,722-Speed 3070.61 samples/sec   Loss 4.6351   LearningRate 0.0109   Epoch: 13   Global Step: 166470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:29:07,115-Speed 3019.13 samples/sec   Loss 4.5987   LearningRate 0.0109   Epoch: 13   Global Step: 166480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:29:10,580-Speed 2956.03 samples/sec   Loss 4.6074   LearningRate 0.0109   Epoch: 13   Global Step: 166490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:29:14,046-Speed 2955.47 samples/sec   Loss 4.5829   LearningRate 0.0109   Epoch: 13   Global Step: 166500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:29:17,377-Speed 3074.97 samples/sec   Loss 4.6318   LearningRate 0.0109   Epoch: 13   Global Step: 166510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:29:20,814-Speed 2980.36 samples/sec   Loss 4.6186   LearningRate 0.0109   Epoch: 13   Global Step: 166520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:29:24,184-Speed 3039.42 samples/sec   Loss 4.5383   LearningRate 0.0109   Epoch: 13   Global Step: 166530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:29:27,593-Speed 3003.87 samples/sec   Loss 4.5696   LearningRate 0.0109   Epoch: 13   Global Step: 166540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:29:31,009-Speed 2998.83 samples/sec   Loss 4.5069   LearningRate 0.0109   Epoch: 13   Global Step: 166550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:29:34,453-Speed 2973.81 samples/sec   Loss 4.5556   LearningRate 0.0109   Epoch: 13   Global Step: 166560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:29:37,785-Speed 3074.98 samples/sec   Loss 4.5884   LearningRate 0.0109   Epoch: 13   Global Step: 166570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:29:41,230-Speed 2973.26 samples/sec   Loss 4.5500   LearningRate 0.0109   Epoch: 13   Global Step: 166580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:29:44,697-Speed 2955.05 samples/sec   Loss 4.5526   LearningRate 0.0109   Epoch: 13   Global Step: 166590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:29:48,154-Speed 2962.75 samples/sec   Loss 4.5876   LearningRate 0.0108   Epoch: 13   Global Step: 166600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:29:51,586-Speed 2984.61 samples/sec   Loss 4.5297   LearningRate 0.0108   Epoch: 13   Global Step: 166610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:29:55,004-Speed 2997.17 samples/sec   Loss 4.6443   LearningRate 0.0108   Epoch: 13   Global Step: 166620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:29:58,402-Speed 3014.15 samples/sec   Loss 4.5607   LearningRate 0.0108   Epoch: 13   Global Step: 166630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:30:01,873-Speed 2951.02 samples/sec   Loss 4.4849   LearningRate 0.0108   Epoch: 13   Global Step: 166640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:30:05,243-Speed 3039.13 samples/sec   Loss 4.6407   LearningRate 0.0108   Epoch: 13   Global Step: 166650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:30:08,596-Speed 3054.78 samples/sec   Loss 4.5034   LearningRate 0.0108   Epoch: 13   Global Step: 166660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:30:11,951-Speed 3052.87 samples/sec   Loss 4.5381   LearningRate 0.0108   Epoch: 13   Global Step: 166670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:30:15,288-Speed 3069.56 samples/sec   Loss 4.5272   LearningRate 0.0108   Epoch: 13   Global Step: 166680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:30:18,642-Speed 3054.22 samples/sec   Loss 4.5912   LearningRate 0.0108   Epoch: 13   Global Step: 166690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:30:21,943-Speed 3103.00 samples/sec   Loss 4.5526   LearningRate 0.0108   Epoch: 13   Global Step: 166700   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:30:25,318-Speed 3035.04 samples/sec   Loss 4.3943   LearningRate 0.0108   Epoch: 13   Global Step: 166710   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:30:28,751-Speed 2983.41 samples/sec   Loss 4.5808   LearningRate 0.0108   Epoch: 13   Global Step: 166720   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:30:32,103-Speed 3055.68 samples/sec   Loss 4.5192   LearningRate 0.0108   Epoch: 13   Global Step: 166730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:30:35,433-Speed 3075.66 samples/sec   Loss 4.5537   LearningRate 0.0108   Epoch: 13   Global Step: 166740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:30:38,758-Speed 3080.68 samples/sec   Loss 4.5772   LearningRate 0.0108   Epoch: 13   Global Step: 166750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:30:42,214-Speed 2964.05 samples/sec   Loss 4.5409   LearningRate 0.0108   Epoch: 13   Global Step: 166760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:30:45,606-Speed 3019.75 samples/sec   Loss 4.6350   LearningRate 0.0108   Epoch: 13   Global Step: 166770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:30:48,970-Speed 3044.78 samples/sec   Loss 4.6320   LearningRate 0.0108   Epoch: 13   Global Step: 166780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:30:52,358-Speed 3023.47 samples/sec   Loss 4.6388   LearningRate 0.0108   Epoch: 13   Global Step: 166790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:30:55,712-Speed 3053.17 samples/sec   Loss 4.5111   LearningRate 0.0108   Epoch: 13   Global Step: 166800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:30:59,148-Speed 2981.58 samples/sec   Loss 4.5163   LearningRate 0.0108   Epoch: 13   Global Step: 166810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:02,509-Speed 3046.91 samples/sec   Loss 4.4694   LearningRate 0.0108   Epoch: 13   Global Step: 166820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:05,942-Speed 2984.15 samples/sec   Loss 4.6529   LearningRate 0.0108   Epoch: 13   Global Step: 166830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:09,347-Speed 3008.08 samples/sec   Loss 4.4540   LearningRate 0.0108   Epoch: 13   Global Step: 166840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:12,774-Speed 2988.70 samples/sec   Loss 4.5638   LearningRate 0.0108   Epoch: 13   Global Step: 166850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:16,190-Speed 2997.66 samples/sec   Loss 4.5266   LearningRate 0.0108   Epoch: 13   Global Step: 166860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:19,516-Speed 3079.92 samples/sec   Loss 4.5473   LearningRate 0.0108   Epoch: 13   Global Step: 166870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:22,887-Speed 3039.12 samples/sec   Loss 4.5484   LearningRate 0.0108   Epoch: 13   Global Step: 166880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:26,245-Speed 3050.33 samples/sec   Loss 4.6309   LearningRate 0.0108   Epoch: 13   Global Step: 166890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:29,601-Speed 3051.52 samples/sec   Loss 4.5337   LearningRate 0.0108   Epoch: 13   Global Step: 166900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:31:32,951-Speed 3057.99 samples/sec   Loss 4.5713   LearningRate 0.0108   Epoch: 13   Global Step: 166910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:31:36,346-Speed 3016.72 samples/sec   Loss 4.5555   LearningRate 0.0108   Epoch: 13   Global Step: 166920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:31:39,710-Speed 3045.64 samples/sec   Loss 4.5190   LearningRate 0.0108   Epoch: 13   Global Step: 166930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:43,131-Speed 2994.25 samples/sec   Loss 4.6400   LearningRate 0.0108   Epoch: 13   Global Step: 166940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:46,512-Speed 3029.35 samples/sec   Loss 4.5815   LearningRate 0.0108   Epoch: 13   Global Step: 166950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:49,893-Speed 3030.08 samples/sec   Loss 4.6430   LearningRate 0.0108   Epoch: 13   Global Step: 166960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:53,263-Speed 3039.84 samples/sec   Loss 4.6371   LearningRate 0.0108   Epoch: 13   Global Step: 166970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:56,602-Speed 3067.34 samples/sec   Loss 4.5611   LearningRate 0.0107   Epoch: 13   Global Step: 166980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:31:59,928-Speed 3079.71 samples/sec   Loss 4.6075   LearningRate 0.0107   Epoch: 13   Global Step: 166990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:32:03,312-Speed 3026.60 samples/sec   Loss 4.6673   LearningRate 0.0107   Epoch: 13   Global Step: 167000   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 17:32:06,701-Speed 3022.37 samples/sec   Loss 4.5138   LearningRate 0.0107   Epoch: 13   Global Step: 167010   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 17:32:10,053-Speed 3056.02 samples/sec   Loss 4.6230   LearningRate 0.0107   Epoch: 13   Global Step: 167020   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 17:32:13,388-Speed 3071.54 samples/sec   Loss 4.6504   LearningRate 0.0107   Epoch: 13   Global Step: 167030   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 17:32:16,762-Speed 3035.81 samples/sec   Loss 4.7198   LearningRate 0.0107   Epoch: 13   Global Step: 167040   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 17:32:20,196-Speed 2982.80 samples/sec   Loss 4.6085   LearningRate 0.0107   Epoch: 13   Global Step: 167050   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 17:32:23,576-Speed 3030.03 samples/sec   Loss 4.4994   LearningRate 0.0107   Epoch: 13   Global Step: 167060   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 17:32:26,953-Speed 3033.40 samples/sec   Loss 4.5751   LearningRate 0.0107   Epoch: 13   Global Step: 167070   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 17:32:30,378-Speed 2990.96 samples/sec   Loss 4.5382   LearningRate 0.0107   Epoch: 13   Global Step: 167080   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 17:32:33,722-Speed 3062.81 samples/sec   Loss 4.6177   LearningRate 0.0107   Epoch: 13   Global Step: 167090   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 17:32:37,055-Speed 3072.75 samples/sec   Loss 4.4968   LearningRate 0.0107   Epoch: 13   Global Step: 167100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:32:40,484-Speed 2987.96 samples/sec   Loss 4.4863   LearningRate 0.0107   Epoch: 13   Global Step: 167110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:32:43,872-Speed 3023.64 samples/sec   Loss 4.5287   LearningRate 0.0107   Epoch: 13   Global Step: 167120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:32:47,272-Speed 3012.36 samples/sec   Loss 4.5891   LearningRate 0.0107   Epoch: 13   Global Step: 167130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:32:50,720-Speed 2970.72 samples/sec   Loss 4.5433   LearningRate 0.0107   Epoch: 13   Global Step: 167140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:32:54,161-Speed 2976.50 samples/sec   Loss 4.5630   LearningRate 0.0107   Epoch: 13   Global Step: 167150   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:32:57,481-Speed 3085.62 samples/sec   Loss 4.5723   LearningRate 0.0107   Epoch: 13   Global Step: 167160   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:33:00,893-Speed 3001.77 samples/sec   Loss 4.6022   LearningRate 0.0107   Epoch: 13   Global Step: 167170   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:33:04,262-Speed 3039.96 samples/sec   Loss 4.5533   LearningRate 0.0107   Epoch: 13   Global Step: 167180   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:33:07,586-Speed 3082.01 samples/sec   Loss 4.5559   LearningRate 0.0107   Epoch: 13   Global Step: 167190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:33:10,925-Speed 3067.22 samples/sec   Loss 4.6046   LearningRate 0.0107   Epoch: 13   Global Step: 167200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:33:14,310-Speed 3026.19 samples/sec   Loss 4.6382   LearningRate 0.0107   Epoch: 13   Global Step: 167210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:33:17,668-Speed 3049.70 samples/sec   Loss 4.5267   LearningRate 0.0107   Epoch: 13   Global Step: 167220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:33:20,995-Speed 3079.08 samples/sec   Loss 4.5468   LearningRate 0.0107   Epoch: 13   Global Step: 167230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:33:24,402-Speed 3006.20 samples/sec   Loss 4.6134   LearningRate 0.0107   Epoch: 13   Global Step: 167240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:33:27,832-Speed 2986.73 samples/sec   Loss 4.5706   LearningRate 0.0107   Epoch: 13   Global Step: 167250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:33:31,158-Speed 3079.01 samples/sec   Loss 4.5368   LearningRate 0.0107   Epoch: 13   Global Step: 167260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:33:34,481-Speed 3083.23 samples/sec   Loss 4.5217   LearningRate 0.0107   Epoch: 13   Global Step: 167270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:33:37,859-Speed 3032.07 samples/sec   Loss 4.6447   LearningRate 0.0107   Epoch: 13   Global Step: 167280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:33:41,231-Speed 3038.19 samples/sec   Loss 4.6542   LearningRate 0.0107   Epoch: 13   Global Step: 167290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:33:44,545-Speed 3090.13 samples/sec   Loss 4.4947   LearningRate 0.0107   Epoch: 13   Global Step: 167300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:33:47,944-Speed 3014.15 samples/sec   Loss 4.5461   LearningRate 0.0107   Epoch: 13   Global Step: 167310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:33:51,314-Speed 3039.36 samples/sec   Loss 4.6451   LearningRate 0.0107   Epoch: 13   Global Step: 167320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:33:54,727-Speed 3001.13 samples/sec   Loss 4.6361   LearningRate 0.0107   Epoch: 13   Global Step: 167330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:33:58,062-Speed 3071.68 samples/sec   Loss 4.5049   LearningRate 0.0107   Epoch: 13   Global Step: 167340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:34:01,415-Speed 3054.73 samples/sec   Loss 4.6247   LearningRate 0.0106   Epoch: 13   Global Step: 167350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:34:04,826-Speed 3003.04 samples/sec   Loss 4.4603   LearningRate 0.0106   Epoch: 13   Global Step: 167360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 17:34:08,256-Speed 2986.04 samples/sec   Loss 4.5709   LearningRate 0.0106   Epoch: 13   Global Step: 167370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:34:11,635-Speed 3031.53 samples/sec   Loss 4.4940   LearningRate 0.0106   Epoch: 13   Global Step: 167380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:34:14,954-Speed 3085.49 samples/sec   Loss 4.6354   LearningRate 0.0106   Epoch: 13   Global Step: 167390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:34:18,418-Speed 2957.78 samples/sec   Loss 4.5691   LearningRate 0.0106   Epoch: 13   Global Step: 167400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:34:21,868-Speed 2968.60 samples/sec   Loss 4.6041   LearningRate 0.0106   Epoch: 13   Global Step: 167410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:34:25,245-Speed 3032.97 samples/sec   Loss 4.5932   LearningRate 0.0106   Epoch: 13   Global Step: 167420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:34:28,628-Speed 3028.21 samples/sec   Loss 4.5764   LearningRate 0.0106   Epoch: 13   Global Step: 167430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:34:32,068-Speed 2978.48 samples/sec   Loss 4.5921   LearningRate 0.0106   Epoch: 13   Global Step: 167440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:34:35,528-Speed 2959.49 samples/sec   Loss 4.5670   LearningRate 0.0106   Epoch: 13   Global Step: 167450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:34:38,956-Speed 2988.61 samples/sec   Loss 4.5794   LearningRate 0.0106   Epoch: 13   Global Step: 167460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:34:42,350-Speed 3017.75 samples/sec   Loss 4.5462   LearningRate 0.0106   Epoch: 13   Global Step: 167470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:34:45,738-Speed 3023.16 samples/sec   Loss 4.5621   LearningRate 0.0106   Epoch: 13   Global Step: 167480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:34:49,152-Speed 2999.93 samples/sec   Loss 4.6757   LearningRate 0.0106   Epoch: 13   Global Step: 167490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:34:52,551-Speed 3014.04 samples/sec   Loss 4.5754   LearningRate 0.0106   Epoch: 13   Global Step: 167500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:34:55,954-Speed 3009.82 samples/sec   Loss 4.5833   LearningRate 0.0106   Epoch: 13   Global Step: 167510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:34:59,333-Speed 3031.77 samples/sec   Loss 4.5975   LearningRate 0.0106   Epoch: 13   Global Step: 167520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:02,712-Speed 3031.03 samples/sec   Loss 4.5612   LearningRate 0.0106   Epoch: 13   Global Step: 167530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:06,023-Speed 3094.17 samples/sec   Loss 4.6211   LearningRate 0.0106   Epoch: 13   Global Step: 167540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:09,423-Speed 3012.57 samples/sec   Loss 4.5463   LearningRate 0.0106   Epoch: 13   Global Step: 167550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:12,731-Speed 3095.91 samples/sec   Loss 4.6337   LearningRate 0.0106   Epoch: 13   Global Step: 167560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:16,074-Speed 3064.10 samples/sec   Loss 4.6161   LearningRate 0.0106   Epoch: 13   Global Step: 167570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:19,506-Speed 2984.13 samples/sec   Loss 4.6440   LearningRate 0.0106   Epoch: 13   Global Step: 167580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:22,922-Speed 2998.69 samples/sec   Loss 4.6811   LearningRate 0.0106   Epoch: 13   Global Step: 167590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:26,241-Speed 3087.34 samples/sec   Loss 4.5080   LearningRate 0.0106   Epoch: 13   Global Step: 167600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:29,592-Speed 3055.83 samples/sec   Loss 4.5380   LearningRate 0.0106   Epoch: 13   Global Step: 167610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:35:32,938-Speed 3061.27 samples/sec   Loss 4.4111   LearningRate 0.0106   Epoch: 13   Global Step: 167620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:36,334-Speed 3016.16 samples/sec   Loss 4.5159   LearningRate 0.0106   Epoch: 13   Global Step: 167630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:39,676-Speed 3064.88 samples/sec   Loss 4.6393   LearningRate 0.0106   Epoch: 13   Global Step: 167640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:43,037-Speed 3047.66 samples/sec   Loss 4.6093   LearningRate 0.0106   Epoch: 13   Global Step: 167650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:46,457-Speed 2995.90 samples/sec   Loss 4.6343   LearningRate 0.0106   Epoch: 13   Global Step: 167660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:49,811-Speed 3052.98 samples/sec   Loss 4.6711   LearningRate 0.0106   Epoch: 13   Global Step: 167670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:53,221-Speed 3004.09 samples/sec   Loss 4.5924   LearningRate 0.0106   Epoch: 13   Global Step: 167680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:56,548-Speed 3078.66 samples/sec   Loss 4.4959   LearningRate 0.0106   Epoch: 13   Global Step: 167690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:35:59,894-Speed 3060.83 samples/sec   Loss 4.5857   LearningRate 0.0106   Epoch: 13   Global Step: 167700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:36:03,207-Speed 3092.40 samples/sec   Loss 4.6130   LearningRate 0.0106   Epoch: 13   Global Step: 167710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:36:06,537-Speed 3076.16 samples/sec   Loss 4.5739   LearningRate 0.0106   Epoch: 13   Global Step: 167720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:36:09,883-Speed 3061.46 samples/sec   Loss 4.6858   LearningRate 0.0106   Epoch: 13   Global Step: 167730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:36:13,288-Speed 3007.56 samples/sec   Loss 4.5989   LearningRate 0.0105   Epoch: 13   Global Step: 167740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:36:16,680-Speed 3019.69 samples/sec   Loss 4.6267   LearningRate 0.0105   Epoch: 13   Global Step: 167750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:36:20,025-Speed 3062.10 samples/sec   Loss 4.6138   LearningRate 0.0105   Epoch: 13   Global Step: 167760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:36:23,559-Speed 2898.44 samples/sec   Loss 4.4763   LearningRate 0.0105   Epoch: 13   Global Step: 167770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:36:26,983-Speed 2991.10 samples/sec   Loss 4.5749   LearningRate 0.0105   Epoch: 13   Global Step: 167780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:36:30,409-Speed 2990.51 samples/sec   Loss 4.6793   LearningRate 0.0105   Epoch: 13   Global Step: 167790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:36:33,779-Speed 3039.11 samples/sec   Loss 4.6647   LearningRate 0.0105   Epoch: 13   Global Step: 167800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:36:37,185-Speed 3007.24 samples/sec   Loss 4.6160   LearningRate 0.0105   Epoch: 13   Global Step: 167810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:36:40,647-Speed 2958.59 samples/sec   Loss 4.6965   LearningRate 0.0105   Epoch: 13   Global Step: 167820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:36:44,054-Speed 3006.75 samples/sec   Loss 4.7258   LearningRate 0.0105   Epoch: 13   Global Step: 167830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:36:47,463-Speed 3005.07 samples/sec   Loss 4.6712   LearningRate 0.0105   Epoch: 13   Global Step: 167840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:36:50,768-Speed 3099.01 samples/sec   Loss 4.6629   LearningRate 0.0105   Epoch: 13   Global Step: 167850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:36:54,158-Speed 3020.98 samples/sec   Loss 4.6913   LearningRate 0.0105   Epoch: 13   Global Step: 167860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:36:57,551-Speed 3019.08 samples/sec   Loss 4.6805   LearningRate 0.0105   Epoch: 13   Global Step: 167870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:00,916-Speed 3043.94 samples/sec   Loss 4.6329   LearningRate 0.0105   Epoch: 13   Global Step: 167880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:04,280-Speed 3044.41 samples/sec   Loss 4.5875   LearningRate 0.0105   Epoch: 13   Global Step: 167890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:07,664-Speed 3028.13 samples/sec   Loss 4.7088   LearningRate 0.0105   Epoch: 13   Global Step: 167900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:11,039-Speed 3034.89 samples/sec   Loss 4.5169   LearningRate 0.0105   Epoch: 13   Global Step: 167910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:14,382-Speed 3063.22 samples/sec   Loss 4.5864   LearningRate 0.0105   Epoch: 13   Global Step: 167920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:17,711-Speed 3077.78 samples/sec   Loss 4.6502   LearningRate 0.0105   Epoch: 13   Global Step: 167930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:21,062-Speed 3056.54 samples/sec   Loss 4.5059   LearningRate 0.0105   Epoch: 13   Global Step: 167940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:24,414-Speed 3055.13 samples/sec   Loss 4.6067   LearningRate 0.0105   Epoch: 13   Global Step: 167950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:27,740-Speed 3079.48 samples/sec   Loss 4.6473   LearningRate 0.0105   Epoch: 13   Global Step: 167960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:37:31,097-Speed 3051.79 samples/sec   Loss 4.5194   LearningRate 0.0105   Epoch: 13   Global Step: 167970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:34,419-Speed 3083.02 samples/sec   Loss 4.5494   LearningRate 0.0105   Epoch: 13   Global Step: 167980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:37,812-Speed 3018.35 samples/sec   Loss 4.4902   LearningRate 0.0105   Epoch: 13   Global Step: 167990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:41,233-Speed 2994.24 samples/sec   Loss 4.5472   LearningRate 0.0105   Epoch: 13   Global Step: 168000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:44,600-Speed 3042.72 samples/sec   Loss 4.6879   LearningRate 0.0105   Epoch: 13   Global Step: 168010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:48,046-Speed 2972.46 samples/sec   Loss 4.6343   LearningRate 0.0105   Epoch: 13   Global Step: 168020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:51,458-Speed 3002.29 samples/sec   Loss 4.6590   LearningRate 0.0105   Epoch: 13   Global Step: 168030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:54,807-Speed 3058.23 samples/sec   Loss 4.5428   LearningRate 0.0105   Epoch: 13   Global Step: 168040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:37:58,148-Speed 3065.55 samples/sec   Loss 4.6149   LearningRate 0.0105   Epoch: 13   Global Step: 168050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:01,460-Speed 3093.22 samples/sec   Loss 4.6564   LearningRate 0.0105   Epoch: 13   Global Step: 168060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:04,832-Speed 3037.18 samples/sec   Loss 4.5927   LearningRate 0.0105   Epoch: 13   Global Step: 168070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:38:08,127-Speed 3108.93 samples/sec   Loss 4.5164   LearningRate 0.0105   Epoch: 13   Global Step: 168080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:11,457-Speed 3075.65 samples/sec   Loss 4.5812   LearningRate 0.0105   Epoch: 13   Global Step: 168090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:14,787-Speed 3076.20 samples/sec   Loss 4.6232   LearningRate 0.0105   Epoch: 13   Global Step: 168100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:18,150-Speed 3045.64 samples/sec   Loss 4.6374   LearningRate 0.0105   Epoch: 13   Global Step: 168110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:21,508-Speed 3050.20 samples/sec   Loss 4.7475   LearningRate 0.0104   Epoch: 13   Global Step: 168120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:24,961-Speed 2966.61 samples/sec   Loss 4.5352   LearningRate 0.0104   Epoch: 13   Global Step: 168130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:28,374-Speed 3000.26 samples/sec   Loss 4.6334   LearningRate 0.0104   Epoch: 13   Global Step: 168140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:31,715-Speed 3066.67 samples/sec   Loss 4.6286   LearningRate 0.0104   Epoch: 13   Global Step: 168150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:35,135-Speed 2994.67 samples/sec   Loss 4.5208   LearningRate 0.0104   Epoch: 13   Global Step: 168160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:38,463-Speed 3077.74 samples/sec   Loss 4.6572   LearningRate 0.0104   Epoch: 13   Global Step: 168170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:41,848-Speed 3026.45 samples/sec   Loss 4.5946   LearningRate 0.0104   Epoch: 13   Global Step: 168180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:38:45,211-Speed 3045.92 samples/sec   Loss 4.7224   LearningRate 0.0104   Epoch: 13   Global Step: 168190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:38:48,560-Speed 3058.58 samples/sec   Loss 4.5820   LearningRate 0.0104   Epoch: 13   Global Step: 168200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:51,914-Speed 3053.90 samples/sec   Loss 4.5831   LearningRate 0.0104   Epoch: 13   Global Step: 168210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:55,222-Speed 3096.08 samples/sec   Loss 4.6602   LearningRate 0.0104   Epoch: 13   Global Step: 168220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:38:58,574-Speed 3055.54 samples/sec   Loss 4.5634   LearningRate 0.0104   Epoch: 13   Global Step: 168230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:01,961-Speed 3025.38 samples/sec   Loss 4.6675   LearningRate 0.0104   Epoch: 13   Global Step: 168240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:05,399-Speed 2979.04 samples/sec   Loss 4.6265   LearningRate 0.0104   Epoch: 13   Global Step: 168250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:08,763-Speed 3045.00 samples/sec   Loss 4.7103   LearningRate 0.0104   Epoch: 13   Global Step: 168260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:12,166-Speed 3009.60 samples/sec   Loss 4.6328   LearningRate 0.0104   Epoch: 13   Global Step: 168270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:15,533-Speed 3042.35 samples/sec   Loss 4.6958   LearningRate 0.0104   Epoch: 13   Global Step: 168280   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:18,897-Speed 3045.32 samples/sec   Loss 4.6078   LearningRate 0.0104   Epoch: 13   Global Step: 168290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:22,217-Speed 3084.46 samples/sec   Loss 4.5929   LearningRate 0.0104   Epoch: 13   Global Step: 168300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:39:25,507-Speed 3113.41 samples/sec   Loss 4.5674   LearningRate 0.0104   Epoch: 13   Global Step: 168310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:28,852-Speed 3062.66 samples/sec   Loss 4.5979   LearningRate 0.0104   Epoch: 13   Global Step: 168320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:32,375-Speed 2907.15 samples/sec   Loss 4.6819   LearningRate 0.0104   Epoch: 13   Global Step: 168330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:35,704-Speed 3076.84 samples/sec   Loss 4.6249   LearningRate 0.0104   Epoch: 13   Global Step: 168340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:39,134-Speed 2987.12 samples/sec   Loss 4.5648   LearningRate 0.0104   Epoch: 13   Global Step: 168350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:42,614-Speed 2943.57 samples/sec   Loss 4.5976   LearningRate 0.0104   Epoch: 13   Global Step: 168360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:46,089-Speed 2947.20 samples/sec   Loss 4.5872   LearningRate 0.0104   Epoch: 13   Global Step: 168370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:49,510-Speed 2994.55 samples/sec   Loss 4.7786   LearningRate 0.0104   Epoch: 13   Global Step: 168380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:52,931-Speed 2994.39 samples/sec   Loss 4.6498   LearningRate 0.0104   Epoch: 13   Global Step: 168390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:56,332-Speed 3011.06 samples/sec   Loss 4.6554   LearningRate 0.0104   Epoch: 13   Global Step: 168400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:39:59,777-Speed 2973.71 samples/sec   Loss 4.5838   LearningRate 0.0104   Epoch: 13   Global Step: 168410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:40:03,103-Speed 3079.93 samples/sec   Loss 4.6508   LearningRate 0.0104   Epoch: 13   Global Step: 168420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:40:06,514-Speed 3002.54 samples/sec   Loss 4.6571   LearningRate 0.0104   Epoch: 13   Global Step: 168430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:40:09,866-Speed 3056.19 samples/sec   Loss 4.7186   LearningRate 0.0104   Epoch: 13   Global Step: 168440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:40:13,196-Speed 3075.79 samples/sec   Loss 4.6360   LearningRate 0.0104   Epoch: 13   Global Step: 168450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:40:16,505-Speed 3095.53 samples/sec   Loss 4.6470   LearningRate 0.0104   Epoch: 13   Global Step: 168460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:40:19,937-Speed 2984.78 samples/sec   Loss 4.5786   LearningRate 0.0104   Epoch: 13   Global Step: 168470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:40:23,336-Speed 3013.25 samples/sec   Loss 4.6980   LearningRate 0.0104   Epoch: 13   Global Step: 168480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:40:26,758-Speed 2993.22 samples/sec   Loss 4.6815   LearningRate 0.0104   Epoch: 13   Global Step: 168490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:40:30,262-Speed 2923.50 samples/sec   Loss 4.6478   LearningRate 0.0103   Epoch: 13   Global Step: 168500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:40:33,746-Speed 2939.98 samples/sec   Loss 4.6437   LearningRate 0.0103   Epoch: 13   Global Step: 168510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:40:37,049-Speed 3100.65 samples/sec   Loss 4.5455   LearningRate 0.0103   Epoch: 13   Global Step: 168520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:40:40,387-Speed 3069.08 samples/sec   Loss 4.7047   LearningRate 0.0103   Epoch: 13   Global Step: 168530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:40:43,850-Speed 2957.42 samples/sec   Loss 4.5456   LearningRate 0.0103   Epoch: 13   Global Step: 168540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:40:47,221-Speed 3038.39 samples/sec   Loss 4.5829   LearningRate 0.0103   Epoch: 13   Global Step: 168550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:40:50,693-Speed 2950.19 samples/sec   Loss 4.6094   LearningRate 0.0103   Epoch: 13   Global Step: 168560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:40:54,043-Speed 3058.17 samples/sec   Loss 4.6043   LearningRate 0.0103   Epoch: 13   Global Step: 168570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:40:57,458-Speed 2999.24 samples/sec   Loss 4.5760   LearningRate 0.0103   Epoch: 13   Global Step: 168580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:41:00,871-Speed 3001.39 samples/sec   Loss 4.6150   LearningRate 0.0103   Epoch: 13   Global Step: 168590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:41:04,277-Speed 3006.85 samples/sec   Loss 4.6336   LearningRate 0.0103   Epoch: 13   Global Step: 168600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:41:07,656-Speed 3032.41 samples/sec   Loss 4.6845   LearningRate 0.0103   Epoch: 13   Global Step: 168610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:41:11,167-Speed 2916.63 samples/sec   Loss 4.6552   LearningRate 0.0103   Epoch: 13   Global Step: 168620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:41:14,656-Speed 2936.26 samples/sec   Loss 4.5677   LearningRate 0.0103   Epoch: 13   Global Step: 168630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:41:18,065-Speed 3004.29 samples/sec   Loss 4.6249   LearningRate 0.0103   Epoch: 13   Global Step: 168640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:41:21,425-Speed 3049.06 samples/sec   Loss 4.7552   LearningRate 0.0103   Epoch: 13   Global Step: 168650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:41:24,817-Speed 3019.76 samples/sec   Loss 4.6523   LearningRate 0.0103   Epoch: 13   Global Step: 168660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:41:28,183-Speed 3043.10 samples/sec   Loss 4.6133   LearningRate 0.0103   Epoch: 13   Global Step: 168670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:41:31,626-Speed 2976.55 samples/sec   Loss 4.5851   LearningRate 0.0103   Epoch: 13   Global Step: 168680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:41:35,020-Speed 3018.28 samples/sec   Loss 4.6347   LearningRate 0.0103   Epoch: 13   Global Step: 168690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:41:38,377-Speed 3051.27 samples/sec   Loss 4.6774   LearningRate 0.0103   Epoch: 13   Global Step: 168700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:41:41,716-Speed 3067.05 samples/sec   Loss 4.6478   LearningRate 0.0103   Epoch: 13   Global Step: 168710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:41:45,125-Speed 3005.28 samples/sec   Loss 4.6018   LearningRate 0.0103   Epoch: 13   Global Step: 168720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:41:48,518-Speed 3019.03 samples/sec   Loss 4.6116   LearningRate 0.0103   Epoch: 13   Global Step: 168730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:41:51,885-Speed 3042.10 samples/sec   Loss 4.6685   LearningRate 0.0103   Epoch: 13   Global Step: 168740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:41:55,229-Speed 3063.47 samples/sec   Loss 4.5903   LearningRate 0.0103   Epoch: 13   Global Step: 168750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:41:58,612-Speed 3027.34 samples/sec   Loss 4.6542   LearningRate 0.0103   Epoch: 13   Global Step: 168760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:01,966-Speed 3054.70 samples/sec   Loss 4.7216   LearningRate 0.0103   Epoch: 13   Global Step: 168770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:05,335-Speed 3039.71 samples/sec   Loss 4.6160   LearningRate 0.0103   Epoch: 13   Global Step: 168780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:08,754-Speed 2996.77 samples/sec   Loss 4.4893   LearningRate 0.0103   Epoch: 13   Global Step: 168790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:12,133-Speed 3030.32 samples/sec   Loss 4.6198   LearningRate 0.0103   Epoch: 13   Global Step: 168800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:15,521-Speed 3023.99 samples/sec   Loss 4.5799   LearningRate 0.0103   Epoch: 13   Global Step: 168810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:18,886-Speed 3043.58 samples/sec   Loss 4.6046   LearningRate 0.0103   Epoch: 13   Global Step: 168820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:22,348-Speed 2958.95 samples/sec   Loss 4.5694   LearningRate 0.0103   Epoch: 13   Global Step: 168830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:42:25,767-Speed 2995.49 samples/sec   Loss 4.6330   LearningRate 0.0103   Epoch: 13   Global Step: 168840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:29,194-Speed 2989.14 samples/sec   Loss 4.6551   LearningRate 0.0103   Epoch: 13   Global Step: 168850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:32,527-Speed 3072.72 samples/sec   Loss 4.6205   LearningRate 0.0103   Epoch: 13   Global Step: 168860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:35,865-Speed 3068.74 samples/sec   Loss 4.6515   LearningRate 0.0103   Epoch: 13   Global Step: 168870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:39,213-Speed 3059.03 samples/sec   Loss 4.6771   LearningRate 0.0103   Epoch: 13   Global Step: 168880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:42,526-Speed 3092.69 samples/sec   Loss 4.6337   LearningRate 0.0102   Epoch: 13   Global Step: 168890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:45,860-Speed 3071.61 samples/sec   Loss 4.5786   LearningRate 0.0102   Epoch: 13   Global Step: 168900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:49,207-Speed 3060.47 samples/sec   Loss 4.6010   LearningRate 0.0102   Epoch: 13   Global Step: 168910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:52,549-Speed 3064.85 samples/sec   Loss 4.7031   LearningRate 0.0102   Epoch: 13   Global Step: 168920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:55,936-Speed 3024.56 samples/sec   Loss 4.5672   LearningRate 0.0102   Epoch: 13   Global Step: 168930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:42:59,386-Speed 2968.19 samples/sec   Loss 4.6350   LearningRate 0.0102   Epoch: 13   Global Step: 168940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:43:02,843-Speed 2963.15 samples/sec   Loss 4.6589   LearningRate 0.0102   Epoch: 13   Global Step: 168950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 17:43:06,161-Speed 3087.30 samples/sec   Loss 4.6117   LearningRate 0.0102   Epoch: 13   Global Step: 168960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:43:09,546-Speed 3025.78 samples/sec   Loss 4.5789   LearningRate 0.0102   Epoch: 13   Global Step: 168970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:43:12,956-Speed 3004.03 samples/sec   Loss 4.6517   LearningRate 0.0102   Epoch: 13   Global Step: 168980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 17:43:16,366-Speed 3003.70 samples/sec   Loss 4.5911   LearningRate 0.0102   Epoch: 13   Global Step: 168990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:43:19,749-Speed 3027.46 samples/sec   Loss 4.6539   LearningRate 0.0102   Epoch: 13   Global Step: 169000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:43:23,163-Speed 3000.43 samples/sec   Loss 4.7365   LearningRate 0.0102   Epoch: 13   Global Step: 169010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:43:26,562-Speed 3013.30 samples/sec   Loss 4.6660   LearningRate 0.0102   Epoch: 13   Global Step: 169020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:43:29,941-Speed 3031.20 samples/sec   Loss 4.5464   LearningRate 0.0102   Epoch: 13   Global Step: 169030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:43:33,374-Speed 2983.47 samples/sec   Loss 4.4815   LearningRate 0.0102   Epoch: 13   Global Step: 169040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:43:36,770-Speed 3017.08 samples/sec   Loss 4.5814   LearningRate 0.0102   Epoch: 13   Global Step: 169050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:43:40,116-Speed 3060.37 samples/sec   Loss 4.5547   LearningRate 0.0102   Epoch: 13   Global Step: 169060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:43:43,468-Speed 3055.66 samples/sec   Loss 4.5924   LearningRate 0.0102   Epoch: 13   Global Step: 169070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:43:46,811-Speed 3064.67 samples/sec   Loss 4.6666   LearningRate 0.0102   Epoch: 13   Global Step: 169080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:43:50,232-Speed 2993.48 samples/sec   Loss 4.5471   LearningRate 0.0102   Epoch: 13   Global Step: 169090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:43:53,647-Speed 2999.87 samples/sec   Loss 4.6169   LearningRate 0.0102   Epoch: 13   Global Step: 169100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:43:57,069-Speed 2993.08 samples/sec   Loss 4.6934   LearningRate 0.0102   Epoch: 13   Global Step: 169110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:00,454-Speed 3025.76 samples/sec   Loss 4.5214   LearningRate 0.0102   Epoch: 13   Global Step: 169120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:03,853-Speed 3014.04 samples/sec   Loss 4.6395   LearningRate 0.0102   Epoch: 13   Global Step: 169130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:07,243-Speed 3021.04 samples/sec   Loss 4.5803   LearningRate 0.0102   Epoch: 13   Global Step: 169140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:10,644-Speed 3012.18 samples/sec   Loss 4.5924   LearningRate 0.0102   Epoch: 13   Global Step: 169150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:14,111-Speed 2954.56 samples/sec   Loss 4.6483   LearningRate 0.0102   Epoch: 13   Global Step: 169160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:17,568-Speed 2962.57 samples/sec   Loss 4.6952   LearningRate 0.0102   Epoch: 13   Global Step: 169170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:20,896-Speed 3077.94 samples/sec   Loss 4.6977   LearningRate 0.0102   Epoch: 13   Global Step: 169180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:24,310-Speed 3000.80 samples/sec   Loss 4.5806   LearningRate 0.0102   Epoch: 13   Global Step: 169190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:27,725-Speed 2999.32 samples/sec   Loss 4.6938   LearningRate 0.0102   Epoch: 13   Global Step: 169200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:31,169-Speed 2974.21 samples/sec   Loss 4.6679   LearningRate 0.0102   Epoch: 13   Global Step: 169210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:34,635-Speed 2955.04 samples/sec   Loss 4.6245   LearningRate 0.0102   Epoch: 13   Global Step: 169220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:38,106-Speed 2951.07 samples/sec   Loss 4.6316   LearningRate 0.0102   Epoch: 13   Global Step: 169230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:41,527-Speed 2994.23 samples/sec   Loss 4.6728   LearningRate 0.0102   Epoch: 13   Global Step: 169240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:44,999-Speed 2949.76 samples/sec   Loss 4.5873   LearningRate 0.0102   Epoch: 13   Global Step: 169250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:48,438-Speed 2978.52 samples/sec   Loss 4.6183   LearningRate 0.0102   Epoch: 13   Global Step: 169260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:44:51,841-Speed 3009.99 samples/sec   Loss 4.7089   LearningRate 0.0102   Epoch: 13   Global Step: 169270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:44:55,241-Speed 3012.36 samples/sec   Loss 4.6653   LearningRate 0.0101   Epoch: 13   Global Step: 169280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:44:58,610-Speed 3040.84 samples/sec   Loss 4.7332   LearningRate 0.0101   Epoch: 13   Global Step: 169290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:45:02,031-Speed 2993.78 samples/sec   Loss 4.5655   LearningRate 0.0101   Epoch: 13   Global Step: 169300   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 17:45:05,434-Speed 3010.42 samples/sec   Loss 4.6386   LearningRate 0.0101   Epoch: 13   Global Step: 169310   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 17:45:08,852-Speed 2996.60 samples/sec   Loss 4.7037   LearningRate 0.0101   Epoch: 13   Global Step: 169320   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 17:45:12,220-Speed 3041.30 samples/sec   Loss 4.6514   LearningRate 0.0101   Epoch: 13   Global Step: 169330   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 17:45:15,676-Speed 2963.94 samples/sec   Loss 4.6034   LearningRate 0.0101   Epoch: 13   Global Step: 169340   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 17:45:19,059-Speed 3028.43 samples/sec   Loss 4.6502   LearningRate 0.0101   Epoch: 13   Global Step: 169350   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 17:45:22,490-Speed 2984.65 samples/sec   Loss 4.5436   LearningRate 0.0101   Epoch: 13   Global Step: 169360   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 17:45:25,950-Speed 2960.57 samples/sec   Loss 4.6776   LearningRate 0.0101   Epoch: 13   Global Step: 169370   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 17:45:29,283-Speed 3073.19 samples/sec   Loss 4.6641   LearningRate 0.0101   Epoch: 13   Global Step: 169380   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 17:45:32,607-Speed 3081.16 samples/sec   Loss 4.5034   LearningRate 0.0101   Epoch: 13   Global Step: 169390   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 17:45:36,049-Speed 2976.23 samples/sec   Loss 4.5805   LearningRate 0.0101   Epoch: 13   Global Step: 169400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:45:39,445-Speed 3015.65 samples/sec   Loss 4.5877   LearningRate 0.0101   Epoch: 13   Global Step: 169410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:45:42,821-Speed 3034.89 samples/sec   Loss 4.5997   LearningRate 0.0101   Epoch: 13   Global Step: 169420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:45:46,167-Speed 3061.45 samples/sec   Loss 4.6745   LearningRate 0.0101   Epoch: 13   Global Step: 169430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:45:49,496-Speed 3076.09 samples/sec   Loss 4.6028   LearningRate 0.0101   Epoch: 13   Global Step: 169440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:45:52,839-Speed 3064.88 samples/sec   Loss 4.6352   LearningRate 0.0101   Epoch: 13   Global Step: 169450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:45:56,160-Speed 3083.53 samples/sec   Loss 4.6447   LearningRate 0.0101   Epoch: 13   Global Step: 169460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:45:59,592-Speed 2984.92 samples/sec   Loss 4.7088   LearningRate 0.0101   Epoch: 13   Global Step: 169470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:46:02,959-Speed 3041.99 samples/sec   Loss 4.5846   LearningRate 0.0101   Epoch: 13   Global Step: 169480   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:46:06,350-Speed 3020.88 samples/sec   Loss 4.6459   LearningRate 0.0101   Epoch: 13   Global Step: 169490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:46:09,699-Speed 3057.78 samples/sec   Loss 4.6056   LearningRate 0.0101   Epoch: 13   Global Step: 169500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:46:13,032-Speed 3073.55 samples/sec   Loss 4.6275   LearningRate 0.0101   Epoch: 13   Global Step: 169510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:46:16,385-Speed 3054.39 samples/sec   Loss 4.5880   LearningRate 0.0101   Epoch: 13   Global Step: 169520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:46:19,745-Speed 3048.95 samples/sec   Loss 4.6497   LearningRate 0.0101   Epoch: 13   Global Step: 169530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:46:23,079-Speed 3072.32 samples/sec   Loss 4.4877   LearningRate 0.0101   Epoch: 13   Global Step: 169540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:46:26,391-Speed 3092.57 samples/sec   Loss 4.6582   LearningRate 0.0101   Epoch: 13   Global Step: 169550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:46:29,721-Speed 3075.29 samples/sec   Loss 4.5788   LearningRate 0.0101   Epoch: 13   Global Step: 169560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:46:33,053-Speed 3074.38 samples/sec   Loss 4.6264   LearningRate 0.0101   Epoch: 13   Global Step: 169570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:46:36,391-Speed 3068.14 samples/sec   Loss 4.6782   LearningRate 0.0101   Epoch: 13   Global Step: 169580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:46:39,754-Speed 3046.38 samples/sec   Loss 4.5482   LearningRate 0.0101   Epoch: 13   Global Step: 169590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:46:43,101-Speed 3060.19 samples/sec   Loss 4.5879   LearningRate 0.0101   Epoch: 13   Global Step: 169600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:46:46,485-Speed 3026.40 samples/sec   Loss 4.6266   LearningRate 0.0101   Epoch: 13   Global Step: 169610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:46:49,866-Speed 3029.79 samples/sec   Loss 4.5989   LearningRate 0.0101   Epoch: 13   Global Step: 169620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:46:53,215-Speed 3058.94 samples/sec   Loss 4.6403   LearningRate 0.0101   Epoch: 13   Global Step: 169630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:46:56,576-Speed 3047.60 samples/sec   Loss 4.5684   LearningRate 0.0101   Epoch: 13   Global Step: 169640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:46:59,913-Speed 3068.56 samples/sec   Loss 4.5716   LearningRate 0.0101   Epoch: 13   Global Step: 169650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:47:03,268-Speed 3052.99 samples/sec   Loss 4.5795   LearningRate 0.0101   Epoch: 13   Global Step: 169660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:47:06,614-Speed 3061.76 samples/sec   Loss 4.5859   LearningRate 0.0100   Epoch: 13   Global Step: 169670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:47:09,982-Speed 3040.89 samples/sec   Loss 4.6087   LearningRate 0.0100   Epoch: 13   Global Step: 169680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:47:13,444-Speed 2958.90 samples/sec   Loss 4.6142   LearningRate 0.0100   Epoch: 13   Global Step: 169690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:47:16,830-Speed 3024.65 samples/sec   Loss 4.5900   LearningRate 0.0100   Epoch: 13   Global Step: 169700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:47:20,285-Speed 2964.85 samples/sec   Loss 4.6295   LearningRate 0.0100   Epoch: 13   Global Step: 169710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:47:23,665-Speed 3030.33 samples/sec   Loss 4.4403   LearningRate 0.0100   Epoch: 13   Global Step: 169720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:47:27,019-Speed 3053.77 samples/sec   Loss 4.6754   LearningRate 0.0100   Epoch: 13   Global Step: 169730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:47:30,401-Speed 3029.14 samples/sec   Loss 4.6065   LearningRate 0.0100   Epoch: 13   Global Step: 169740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:47:33,792-Speed 3020.26 samples/sec   Loss 4.5437   LearningRate 0.0100   Epoch: 13   Global Step: 169750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:47:37,172-Speed 3030.71 samples/sec   Loss 4.6918   LearningRate 0.0100   Epoch: 13   Global Step: 169760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:47:40,542-Speed 3039.57 samples/sec   Loss 4.5893   LearningRate 0.0100   Epoch: 13   Global Step: 169770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:47:43,881-Speed 3067.36 samples/sec   Loss 4.6490   LearningRate 0.0100   Epoch: 13   Global Step: 169780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:47:47,268-Speed 3024.36 samples/sec   Loss 4.6731   LearningRate 0.0100   Epoch: 13   Global Step: 169790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:47:50,718-Speed 2968.45 samples/sec   Loss 4.6149   LearningRate 0.0100   Epoch: 13   Global Step: 169800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:47:54,105-Speed 3024.30 samples/sec   Loss 4.6237   LearningRate 0.0100   Epoch: 13   Global Step: 169810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:47:57,536-Speed 2985.10 samples/sec   Loss 4.6895   LearningRate 0.0100   Epoch: 13   Global Step: 169820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:48:00,878-Speed 3065.03 samples/sec   Loss 4.6079   LearningRate 0.0100   Epoch: 13   Global Step: 169830   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:48:04,200-Speed 3083.79 samples/sec   Loss 4.6561   LearningRate 0.0100   Epoch: 13   Global Step: 169840   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:48:07,513-Speed 3091.13 samples/sec   Loss 4.7246   LearningRate 0.0100   Epoch: 13   Global Step: 169850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:48:10,812-Speed 3105.36 samples/sec   Loss 4.5843   LearningRate 0.0100   Epoch: 13   Global Step: 169860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:48:14,142-Speed 3075.08 samples/sec   Loss 4.5841   LearningRate 0.0100   Epoch: 13   Global Step: 169870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:48:17,548-Speed 3007.40 samples/sec   Loss 4.5679   LearningRate 0.0100   Epoch: 13   Global Step: 169880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:48:20,898-Speed 3057.81 samples/sec   Loss 4.6607   LearningRate 0.0100   Epoch: 13   Global Step: 169890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:48:24,252-Speed 3054.20 samples/sec   Loss 4.5875   LearningRate 0.0100   Epoch: 13   Global Step: 169900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:48:27,637-Speed 3025.69 samples/sec   Loss 4.6695   LearningRate 0.0100   Epoch: 13   Global Step: 169910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:48:31,065-Speed 2988.42 samples/sec   Loss 4.6340   LearningRate 0.0100   Epoch: 13   Global Step: 169920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:48:34,397-Speed 3074.24 samples/sec   Loss 4.6723   LearningRate 0.0100   Epoch: 13   Global Step: 169930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:48:37,849-Speed 2967.33 samples/sec   Loss 4.6450   LearningRate 0.0100   Epoch: 13   Global Step: 169940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:48:41,246-Speed 3014.81 samples/sec   Loss 4.6067   LearningRate 0.0100   Epoch: 13   Global Step: 169950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:48:44,604-Speed 3050.29 samples/sec   Loss 4.6011   LearningRate 0.0100   Epoch: 13   Global Step: 169960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:48:47,956-Speed 3055.99 samples/sec   Loss 4.6231   LearningRate 0.0100   Epoch: 13   Global Step: 169970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:48:51,289-Speed 3073.09 samples/sec   Loss 4.6722   LearningRate 0.0100   Epoch: 13   Global Step: 169980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:48:54,680-Speed 3020.74 samples/sec   Loss 4.6300   LearningRate 0.0100   Epoch: 13   Global Step: 169990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:48:58,090-Speed 3003.25 samples/sec   Loss 4.5859   LearningRate 0.0100   Epoch: 13   Global Step: 170000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:49:01,552-Speed 2959.60 samples/sec   Loss 4.6593   LearningRate 0.0100   Epoch: 13   Global Step: 170010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:49:04,955-Speed 3010.28 samples/sec   Loss 4.5319   LearningRate 0.0100   Epoch: 13   Global Step: 170020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:49:08,374-Speed 2996.09 samples/sec   Loss 4.5513   LearningRate 0.0100   Epoch: 13   Global Step: 170030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:49:11,799-Speed 2990.13 samples/sec   Loss 4.5522   LearningRate 0.0100   Epoch: 13   Global Step: 170040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:49:15,217-Speed 2996.86 samples/sec   Loss 4.6785   LearningRate 0.0100   Epoch: 13   Global Step: 170050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:49:18,636-Speed 2995.96 samples/sec   Loss 4.5711   LearningRate 0.0099   Epoch: 13   Global Step: 170060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:49:22,032-Speed 3015.85 samples/sec   Loss 4.6146   LearningRate 0.0099   Epoch: 13   Global Step: 170070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:49:25,436-Speed 3008.84 samples/sec   Loss 4.5461   LearningRate 0.0099   Epoch: 13   Global Step: 170080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:49:28,800-Speed 3045.06 samples/sec   Loss 4.6550   LearningRate 0.0099   Epoch: 13   Global Step: 170090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:49:32,185-Speed 3026.33 samples/sec   Loss 4.6479   LearningRate 0.0099   Epoch: 13   Global Step: 170100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:49:35,536-Speed 3056.28 samples/sec   Loss 4.5617   LearningRate 0.0099   Epoch: 13   Global Step: 170110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:49:38,909-Speed 3037.23 samples/sec   Loss 4.7012   LearningRate 0.0099   Epoch: 13   Global Step: 170120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:49:42,264-Speed 3052.84 samples/sec   Loss 4.5686   LearningRate 0.0099   Epoch: 13   Global Step: 170130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:49:45,680-Speed 2998.35 samples/sec   Loss 4.6235   LearningRate 0.0099   Epoch: 13   Global Step: 170140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:49:49,066-Speed 3024.60 samples/sec   Loss 4.5815   LearningRate 0.0099   Epoch: 13   Global Step: 170150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:49:52,465-Speed 3013.55 samples/sec   Loss 4.5687   LearningRate 0.0099   Epoch: 13   Global Step: 170160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:49:55,878-Speed 3001.51 samples/sec   Loss 4.6090   LearningRate 0.0099   Epoch: 13   Global Step: 170170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:49:59,234-Speed 3052.27 samples/sec   Loss 4.5819   LearningRate 0.0099   Epoch: 13   Global Step: 170180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:50:02,635-Speed 3011.35 samples/sec   Loss 4.5474   LearningRate 0.0099   Epoch: 13   Global Step: 170190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:50:06,016-Speed 3029.40 samples/sec   Loss 4.6983   LearningRate 0.0099   Epoch: 13   Global Step: 170200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:50:09,481-Speed 2956.61 samples/sec   Loss 4.6758   LearningRate 0.0099   Epoch: 13   Global Step: 170210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:50:12,921-Speed 2977.11 samples/sec   Loss 4.6669   LearningRate 0.0099   Epoch: 13   Global Step: 170220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:50:16,395-Speed 2948.26 samples/sec   Loss 4.6680   LearningRate 0.0099   Epoch: 13   Global Step: 170230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:50:19,827-Speed 2985.21 samples/sec   Loss 4.5775   LearningRate 0.0099   Epoch: 13   Global Step: 170240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:50:23,213-Speed 3024.76 samples/sec   Loss 4.6177   LearningRate 0.0099   Epoch: 13   Global Step: 170250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:50:26,650-Speed 2980.14 samples/sec   Loss 4.5903   LearningRate 0.0099   Epoch: 13   Global Step: 170260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:50:30,060-Speed 3003.75 samples/sec   Loss 4.5853   LearningRate 0.0099   Epoch: 13   Global Step: 170270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:50:33,431-Speed 3038.97 samples/sec   Loss 4.6097   LearningRate 0.0099   Epoch: 13   Global Step: 170280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:50:37,515-Speed 2507.83 samples/sec   Loss 4.5644   LearningRate 0.0099   Epoch: 13   Global Step: 170290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:50:40,876-Speed 3047.92 samples/sec   Loss 4.6808   LearningRate 0.0099   Epoch: 13   Global Step: 170300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:50:44,268-Speed 3019.26 samples/sec   Loss 4.5768   LearningRate 0.0099   Epoch: 13   Global Step: 170310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:50:48,366-Speed 2499.58 samples/sec   Loss 4.4914   LearningRate 0.0099   Epoch: 13   Global Step: 170320   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:50:52,255-Speed 2633.56 samples/sec   Loss 4.5549   LearningRate 0.0099   Epoch: 13   Global Step: 170330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:50:56,145-Speed 2633.07 samples/sec   Loss 4.6531   LearningRate 0.0099   Epoch: 13   Global Step: 170340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:51:00,130-Speed 2570.08 samples/sec   Loss 4.5802   LearningRate 0.0099   Epoch: 13   Global Step: 170350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:51:03,534-Speed 3009.63 samples/sec   Loss 4.6807   LearningRate 0.0099   Epoch: 13   Global Step: 170360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:51:06,876-Speed 3064.88 samples/sec   Loss 4.5925   LearningRate 0.0099   Epoch: 13   Global Step: 170370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:51:10,235-Speed 3049.40 samples/sec   Loss 4.5716   LearningRate 0.0099   Epoch: 13   Global Step: 170380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:51:13,545-Speed 3093.90 samples/sec   Loss 4.6470   LearningRate 0.0099   Epoch: 13   Global Step: 170390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:51:16,910-Speed 3044.70 samples/sec   Loss 4.5788   LearningRate 0.0099   Epoch: 13   Global Step: 170400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:51:20,272-Speed 3046.52 samples/sec   Loss 4.5656   LearningRate 0.0099   Epoch: 13   Global Step: 170410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:51:23,660-Speed 3023.46 samples/sec   Loss 4.5900   LearningRate 0.0099   Epoch: 13   Global Step: 170420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:51:27,044-Speed 3026.22 samples/sec   Loss 4.5974   LearningRate 0.0099   Epoch: 13   Global Step: 170430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:51:30,488-Speed 2973.75 samples/sec   Loss 4.5051   LearningRate 0.0099   Epoch: 13   Global Step: 170440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:51:33,859-Speed 3039.17 samples/sec   Loss 4.5527   LearningRate 0.0099   Epoch: 13   Global Step: 170450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:51:37,201-Speed 3064.05 samples/sec   Loss 4.6159   LearningRate 0.0098   Epoch: 13   Global Step: 170460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:51:40,585-Speed 3027.55 samples/sec   Loss 4.6260   LearningRate 0.0098   Epoch: 13   Global Step: 170470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:51:43,941-Speed 3051.44 samples/sec   Loss 4.6425   LearningRate 0.0098   Epoch: 13   Global Step: 170480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:51:47,338-Speed 3015.75 samples/sec   Loss 4.5790   LearningRate 0.0098   Epoch: 13   Global Step: 170490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:51:50,707-Speed 3040.21 samples/sec   Loss 4.5319   LearningRate 0.0098   Epoch: 13   Global Step: 170500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:51:54,142-Speed 2981.93 samples/sec   Loss 4.5593   LearningRate 0.0098   Epoch: 13   Global Step: 170510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:51:57,561-Speed 2995.74 samples/sec   Loss 4.6261   LearningRate 0.0098   Epoch: 13   Global Step: 170520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:52:00,910-Speed 3058.53 samples/sec   Loss 4.6438   LearningRate 0.0098   Epoch: 13   Global Step: 170530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:52:04,274-Speed 3044.69 samples/sec   Loss 4.5741   LearningRate 0.0098   Epoch: 13   Global Step: 170540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:52:07,665-Speed 3020.43 samples/sec   Loss 4.6203   LearningRate 0.0098   Epoch: 13   Global Step: 170550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:52:11,113-Speed 2971.14 samples/sec   Loss 4.5380   LearningRate 0.0098   Epoch: 13   Global Step: 170560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:52:14,518-Speed 3008.27 samples/sec   Loss 4.5823   LearningRate 0.0098   Epoch: 13   Global Step: 170570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:52:17,921-Speed 3010.35 samples/sec   Loss 4.6494   LearningRate 0.0098   Epoch: 13   Global Step: 170580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:52:21,334-Speed 3000.82 samples/sec   Loss 4.6241   LearningRate 0.0098   Epoch: 13   Global Step: 170590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:52:24,795-Speed 2959.74 samples/sec   Loss 4.6043   LearningRate 0.0098   Epoch: 13   Global Step: 170600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:52:28,278-Speed 2940.79 samples/sec   Loss 4.5805   LearningRate 0.0098   Epoch: 13   Global Step: 170610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:52:31,641-Speed 3045.58 samples/sec   Loss 4.7117   LearningRate 0.0098   Epoch: 13   Global Step: 170620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:52:35,005-Speed 3044.59 samples/sec   Loss 4.5873   LearningRate 0.0098   Epoch: 13   Global Step: 170630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:52:38,374-Speed 3041.28 samples/sec   Loss 4.6574   LearningRate 0.0098   Epoch: 13   Global Step: 170640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:52:41,735-Speed 3047.10 samples/sec   Loss 4.5453   LearningRate 0.0098   Epoch: 13   Global Step: 170650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:52:45,203-Speed 2953.71 samples/sec   Loss 4.5826   LearningRate 0.0098   Epoch: 13   Global Step: 170660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:52:48,656-Speed 2966.14 samples/sec   Loss 4.5775   LearningRate 0.0098   Epoch: 13   Global Step: 170670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:52:52,755-Speed 2498.97 samples/sec   Loss 4.6405   LearningRate 0.0098   Epoch: 13   Global Step: 170680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:52:56,191-Speed 2980.32 samples/sec   Loss 4.5770   LearningRate 0.0098   Epoch: 13   Global Step: 170690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:53:00,359-Speed 2457.59 samples/sec   Loss 4.5071   LearningRate 0.0098   Epoch: 13   Global Step: 170700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:53:03,718-Speed 3049.31 samples/sec   Loss 4.6303   LearningRate 0.0098   Epoch: 13   Global Step: 170710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:53:07,047-Speed 3077.54 samples/sec   Loss 4.5636   LearningRate 0.0098   Epoch: 13   Global Step: 170720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:53:10,403-Speed 3051.67 samples/sec   Loss 4.6215   LearningRate 0.0098   Epoch: 13   Global Step: 170730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:53:13,783-Speed 3031.28 samples/sec   Loss 4.5691   LearningRate 0.0098   Epoch: 13   Global Step: 170740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:53:17,172-Speed 3022.22 samples/sec   Loss 4.6144   LearningRate 0.0098   Epoch: 13   Global Step: 170750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:53:20,486-Speed 3090.77 samples/sec   Loss 4.5597   LearningRate 0.0098   Epoch: 13   Global Step: 170760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:53:23,876-Speed 3021.19 samples/sec   Loss 4.6733   LearningRate 0.0098   Epoch: 13   Global Step: 170770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:53:27,241-Speed 3043.54 samples/sec   Loss 4.5460   LearningRate 0.0098   Epoch: 13   Global Step: 170780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:53:30,586-Speed 3062.57 samples/sec   Loss 4.5470   LearningRate 0.0098   Epoch: 13   Global Step: 170790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:53:34,012-Speed 2990.37 samples/sec   Loss 4.5775   LearningRate 0.0098   Epoch: 13   Global Step: 170800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:53:37,349-Speed 3068.84 samples/sec   Loss 4.6011   LearningRate 0.0098   Epoch: 13   Global Step: 170810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:53:40,692-Speed 3064.28 samples/sec   Loss 4.6242   LearningRate 0.0098   Epoch: 13   Global Step: 170820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:53:44,104-Speed 3001.64 samples/sec   Loss 4.6162   LearningRate 0.0098   Epoch: 13   Global Step: 170830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:53:47,510-Speed 3007.55 samples/sec   Loss 4.5590   LearningRate 0.0098   Epoch: 13   Global Step: 170840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:53:50,877-Speed 3042.54 samples/sec   Loss 4.5869   LearningRate 0.0098   Epoch: 13   Global Step: 170850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:53:54,183-Speed 3097.89 samples/sec   Loss 4.6515   LearningRate 0.0097   Epoch: 13   Global Step: 170860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:53:57,587-Speed 3009.70 samples/sec   Loss 4.5349   LearningRate 0.0097   Epoch: 13   Global Step: 170870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:01,061-Speed 2948.06 samples/sec   Loss 4.6559   LearningRate 0.0097   Epoch: 13   Global Step: 170880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:04,416-Speed 3052.72 samples/sec   Loss 4.6245   LearningRate 0.0097   Epoch: 13   Global Step: 170890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:07,810-Speed 3017.80 samples/sec   Loss 4.6019   LearningRate 0.0097   Epoch: 13   Global Step: 170900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:11,140-Speed 3075.76 samples/sec   Loss 4.5651   LearningRate 0.0097   Epoch: 13   Global Step: 170910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:14,454-Speed 3091.05 samples/sec   Loss 4.6349   LearningRate 0.0097   Epoch: 13   Global Step: 170920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:17,793-Speed 3067.88 samples/sec   Loss 4.6850   LearningRate 0.0097   Epoch: 13   Global Step: 170930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:21,129-Speed 3070.46 samples/sec   Loss 4.5999   LearningRate 0.0097   Epoch: 13   Global Step: 170940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:24,541-Speed 3001.93 samples/sec   Loss 4.5072   LearningRate 0.0097   Epoch: 13   Global Step: 170950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:27,995-Speed 2965.34 samples/sec   Loss 4.5634   LearningRate 0.0097   Epoch: 13   Global Step: 170960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:54:31,471-Speed 2946.71 samples/sec   Loss 4.6524   LearningRate 0.0097   Epoch: 13   Global Step: 170970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:34,854-Speed 3027.37 samples/sec   Loss 4.7182   LearningRate 0.0097   Epoch: 13   Global Step: 170980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:38,221-Speed 3042.01 samples/sec   Loss 4.6189   LearningRate 0.0097   Epoch: 13   Global Step: 170990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:41,606-Speed 3026.27 samples/sec   Loss 4.5055   LearningRate 0.0097   Epoch: 13   Global Step: 171000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:45,020-Speed 3000.43 samples/sec   Loss 4.6070   LearningRate 0.0097   Epoch: 13   Global Step: 171010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:48,415-Speed 3016.67 samples/sec   Loss 4.7112   LearningRate 0.0097   Epoch: 13   Global Step: 171020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:51,779-Speed 3045.00 samples/sec   Loss 4.6378   LearningRate 0.0097   Epoch: 13   Global Step: 171030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:55,265-Speed 2938.25 samples/sec   Loss 4.6077   LearningRate 0.0097   Epoch: 13   Global Step: 171040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:54:58,677-Speed 3001.39 samples/sec   Loss 4.6355   LearningRate 0.0097   Epoch: 13   Global Step: 171050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:02,105-Speed 2987.96 samples/sec   Loss 4.6266   LearningRate 0.0097   Epoch: 13   Global Step: 171060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:05,458-Speed 3054.89 samples/sec   Loss 4.5131   LearningRate 0.0097   Epoch: 13   Global Step: 171070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:55:08,827-Speed 3040.74 samples/sec   Loss 4.4804   LearningRate 0.0097   Epoch: 13   Global Step: 171080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:12,169-Speed 3064.81 samples/sec   Loss 4.6066   LearningRate 0.0097   Epoch: 13   Global Step: 171090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:15,520-Speed 3056.76 samples/sec   Loss 4.6362   LearningRate 0.0097   Epoch: 13   Global Step: 171100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:18,878-Speed 3050.27 samples/sec   Loss 4.5665   LearningRate 0.0097   Epoch: 13   Global Step: 171110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:22,301-Speed 2992.11 samples/sec   Loss 4.5169   LearningRate 0.0097   Epoch: 13   Global Step: 171120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:25,705-Speed 3009.15 samples/sec   Loss 4.6118   LearningRate 0.0097   Epoch: 13   Global Step: 171130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:29,031-Speed 3079.66 samples/sec   Loss 4.6513   LearningRate 0.0097   Epoch: 13   Global Step: 171140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:32,380-Speed 3058.57 samples/sec   Loss 4.6597   LearningRate 0.0097   Epoch: 13   Global Step: 171150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:35,799-Speed 2995.74 samples/sec   Loss 4.6586   LearningRate 0.0097   Epoch: 13   Global Step: 171160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:39,252-Speed 2966.70 samples/sec   Loss 4.6516   LearningRate 0.0097   Epoch: 13   Global Step: 171170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:42,613-Speed 3047.73 samples/sec   Loss 4.5985   LearningRate 0.0097   Epoch: 13   Global Step: 171180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:55:45,897-Speed 3118.32 samples/sec   Loss 4.6640   LearningRate 0.0097   Epoch: 13   Global Step: 171190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:49,341-Speed 2975.00 samples/sec   Loss 4.6439   LearningRate 0.0097   Epoch: 13   Global Step: 171200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:52,673-Speed 3074.22 samples/sec   Loss 4.6094   LearningRate 0.0097   Epoch: 13   Global Step: 171210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:56,076-Speed 3009.48 samples/sec   Loss 4.6768   LearningRate 0.0097   Epoch: 13   Global Step: 171220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:55:59,535-Speed 2961.48 samples/sec   Loss 4.5859   LearningRate 0.0097   Epoch: 13   Global Step: 171230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:56:02,915-Speed 3030.07 samples/sec   Loss 4.6760   LearningRate 0.0097   Epoch: 13   Global Step: 171240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:56:06,286-Speed 3038.68 samples/sec   Loss 4.5592   LearningRate 0.0096   Epoch: 13   Global Step: 171250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:56:09,651-Speed 3044.23 samples/sec   Loss 4.4969   LearningRate 0.0096   Epoch: 13   Global Step: 171260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:56:12,997-Speed 3061.14 samples/sec   Loss 4.5286   LearningRate 0.0096   Epoch: 13   Global Step: 171270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:56:16,347-Speed 3056.95 samples/sec   Loss 4.6260   LearningRate 0.0096   Epoch: 13   Global Step: 171280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:56:19,686-Speed 3067.67 samples/sec   Loss 4.7817   LearningRate 0.0096   Epoch: 13   Global Step: 171290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:56:23,015-Speed 3076.85 samples/sec   Loss 4.6172   LearningRate 0.0096   Epoch: 13   Global Step: 171300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:56:26,408-Speed 3019.35 samples/sec   Loss 4.6031   LearningRate 0.0096   Epoch: 13   Global Step: 171310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:56:29,752-Speed 3063.12 samples/sec   Loss 4.5683   LearningRate 0.0096   Epoch: 13   Global Step: 171320   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:56:33,060-Speed 3095.64 samples/sec   Loss 4.5636   LearningRate 0.0096   Epoch: 13   Global Step: 171330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:56:36,439-Speed 3031.68 samples/sec   Loss 4.5511   LearningRate 0.0096   Epoch: 13   Global Step: 171340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 17:56:39,823-Speed 3026.81 samples/sec   Loss 4.5838   LearningRate 0.0096   Epoch: 13   Global Step: 171350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:56:43,210-Speed 3024.41 samples/sec   Loss 4.5837   LearningRate 0.0096   Epoch: 13   Global Step: 171360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:56:46,563-Speed 3054.45 samples/sec   Loss 4.5673   LearningRate 0.0096   Epoch: 13   Global Step: 171370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:56:49,962-Speed 3013.37 samples/sec   Loss 4.6245   LearningRate 0.0096   Epoch: 13   Global Step: 171380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:56:53,296-Speed 3072.28 samples/sec   Loss 4.6617   LearningRate 0.0096   Epoch: 13   Global Step: 171390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:56:56,682-Speed 3025.47 samples/sec   Loss 4.5804   LearningRate 0.0096   Epoch: 13   Global Step: 171400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:00,054-Speed 3037.76 samples/sec   Loss 4.6436   LearningRate 0.0096   Epoch: 13   Global Step: 171410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:03,440-Speed 3024.44 samples/sec   Loss 4.5982   LearningRate 0.0096   Epoch: 13   Global Step: 171420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:06,760-Speed 3085.34 samples/sec   Loss 4.6068   LearningRate 0.0096   Epoch: 13   Global Step: 171430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:10,096-Speed 3070.79 samples/sec   Loss 4.5161   LearningRate 0.0096   Epoch: 13   Global Step: 171440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:13,377-Speed 3120.96 samples/sec   Loss 4.5152   LearningRate 0.0096   Epoch: 13   Global Step: 171450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:16,724-Speed 3060.44 samples/sec   Loss 4.6483   LearningRate 0.0096   Epoch: 13   Global Step: 171460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:20,056-Speed 3074.44 samples/sec   Loss 4.6118   LearningRate 0.0096   Epoch: 13   Global Step: 171470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:23,435-Speed 3030.80 samples/sec   Loss 4.5058   LearningRate 0.0096   Epoch: 13   Global Step: 171480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:26,826-Speed 3020.55 samples/sec   Loss 4.6476   LearningRate 0.0096   Epoch: 13   Global Step: 171490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:30,156-Speed 3075.84 samples/sec   Loss 4.6880   LearningRate 0.0096   Epoch: 13   Global Step: 171500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:33,513-Speed 3051.83 samples/sec   Loss 4.4881   LearningRate 0.0096   Epoch: 13   Global Step: 171510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:36,846-Speed 3072.96 samples/sec   Loss 4.6205   LearningRate 0.0096   Epoch: 13   Global Step: 171520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:40,212-Speed 3043.63 samples/sec   Loss 4.5923   LearningRate 0.0096   Epoch: 13   Global Step: 171530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:43,575-Speed 3044.99 samples/sec   Loss 4.4875   LearningRate 0.0096   Epoch: 13   Global Step: 171540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:57:46,888-Speed 3091.60 samples/sec   Loss 4.5929   LearningRate 0.0096   Epoch: 13   Global Step: 171550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:57:50,328-Speed 2977.37 samples/sec   Loss 4.6615   LearningRate 0.0096   Epoch: 13   Global Step: 171560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:57:53,725-Speed 3015.58 samples/sec   Loss 4.5416   LearningRate 0.0096   Epoch: 13   Global Step: 171570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:57:57,154-Speed 2987.08 samples/sec   Loss 4.5869   LearningRate 0.0096   Epoch: 13   Global Step: 171580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:58:00,569-Speed 2999.55 samples/sec   Loss 4.6508   LearningRate 0.0096   Epoch: 13   Global Step: 171590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:58:03,895-Speed 3080.09 samples/sec   Loss 4.6409   LearningRate 0.0096   Epoch: 13   Global Step: 171600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:07,268-Speed 3036.64 samples/sec   Loss 4.6764   LearningRate 0.0096   Epoch: 13   Global Step: 171610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:10,605-Speed 3069.25 samples/sec   Loss 4.5795   LearningRate 0.0096   Epoch: 13   Global Step: 171620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:14,021-Speed 2998.52 samples/sec   Loss 4.5325   LearningRate 0.0096   Epoch: 13   Global Step: 171630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:17,417-Speed 3016.75 samples/sec   Loss 4.6052   LearningRate 0.0096   Epoch: 13   Global Step: 171640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:20,779-Speed 3046.63 samples/sec   Loss 4.5624   LearningRate 0.0096   Epoch: 13   Global Step: 171650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:24,107-Speed 3077.58 samples/sec   Loss 4.4824   LearningRate 0.0095   Epoch: 13   Global Step: 171660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:27,517-Speed 3003.90 samples/sec   Loss 4.6415   LearningRate 0.0095   Epoch: 13   Global Step: 171670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:30,897-Speed 3031.02 samples/sec   Loss 4.5537   LearningRate 0.0095   Epoch: 13   Global Step: 171680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:34,258-Speed 3046.85 samples/sec   Loss 4.4879   LearningRate 0.0095   Epoch: 13   Global Step: 171690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:37,601-Speed 3064.08 samples/sec   Loss 4.5011   LearningRate 0.0095   Epoch: 13   Global Step: 171700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:41,021-Speed 2995.63 samples/sec   Loss 4.5347   LearningRate 0.0095   Epoch: 13   Global Step: 171710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:44,403-Speed 3028.37 samples/sec   Loss 4.5052   LearningRate 0.0095   Epoch: 13   Global Step: 171720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:47,743-Speed 3067.78 samples/sec   Loss 4.5594   LearningRate 0.0095   Epoch: 13   Global Step: 171730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:51,071-Speed 3078.14 samples/sec   Loss 4.5631   LearningRate 0.0095   Epoch: 13   Global Step: 171740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:54,488-Speed 2997.74 samples/sec   Loss 4.4670   LearningRate 0.0095   Epoch: 13   Global Step: 171750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:58:57,815-Speed 3078.41 samples/sec   Loss 4.5985   LearningRate 0.0095   Epoch: 13   Global Step: 171760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:59:01,260-Speed 2973.63 samples/sec   Loss 4.7160   LearningRate 0.0095   Epoch: 13   Global Step: 171770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:59:04,569-Speed 3094.72 samples/sec   Loss 4.5237   LearningRate 0.0095   Epoch: 13   Global Step: 171780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:59:07,949-Speed 3030.96 samples/sec   Loss 4.5822   LearningRate 0.0095   Epoch: 13   Global Step: 171790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:59:11,368-Speed 2996.05 samples/sec   Loss 4.5578   LearningRate 0.0095   Epoch: 13   Global Step: 171800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:59:14,763-Speed 3016.49 samples/sec   Loss 4.5884   LearningRate 0.0095   Epoch: 13   Global Step: 171810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:59:18,140-Speed 3033.21 samples/sec   Loss 4.6460   LearningRate 0.0095   Epoch: 13   Global Step: 171820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:59:21,481-Speed 3065.94 samples/sec   Loss 4.5701   LearningRate 0.0095   Epoch: 13   Global Step: 171830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 17:59:24,811-Speed 3075.71 samples/sec   Loss 4.7124   LearningRate 0.0095   Epoch: 13   Global Step: 171840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:59:28,252-Speed 2977.16 samples/sec   Loss 4.5470   LearningRate 0.0095   Epoch: 13   Global Step: 171850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:59:31,670-Speed 2997.01 samples/sec   Loss 4.5900   LearningRate 0.0095   Epoch: 13   Global Step: 171860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:59:35,145-Speed 2947.72 samples/sec   Loss 4.6059   LearningRate 0.0095   Epoch: 13   Global Step: 171870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:59:38,566-Speed 2993.50 samples/sec   Loss 4.5638   LearningRate 0.0095   Epoch: 13   Global Step: 171880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:59:41,996-Speed 2986.34 samples/sec   Loss 4.5648   LearningRate 0.0095   Epoch: 13   Global Step: 171890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:59:45,356-Speed 3049.05 samples/sec   Loss 4.5700   LearningRate 0.0095   Epoch: 13   Global Step: 171900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:59:48,802-Speed 2971.95 samples/sec   Loss 4.6634   LearningRate 0.0095   Epoch: 13   Global Step: 171910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:59:52,198-Speed 3016.21 samples/sec   Loss 4.6714   LearningRate 0.0095   Epoch: 13   Global Step: 171920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:59:55,605-Speed 3006.86 samples/sec   Loss 4.6334   LearningRate 0.0095   Epoch: 13   Global Step: 171930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 17:59:58,953-Speed 3058.90 samples/sec   Loss 4.5227   LearningRate 0.0095   Epoch: 13   Global Step: 171940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:00:02,406-Speed 2967.03 samples/sec   Loss 4.5810   LearningRate 0.0095   Epoch: 13   Global Step: 171950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:00:05,839-Speed 2983.47 samples/sec   Loss 4.5308   LearningRate 0.0095   Epoch: 13   Global Step: 171960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:00:09,237-Speed 3014.25 samples/sec   Loss 4.5847   LearningRate 0.0095   Epoch: 13   Global Step: 171970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:00:12,663-Speed 2992.62 samples/sec   Loss 4.5833   LearningRate 0.0095   Epoch: 13   Global Step: 171980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:00:16,119-Speed 2963.70 samples/sec   Loss 4.6671   LearningRate 0.0095   Epoch: 13   Global Step: 171990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:00:19,510-Speed 3020.08 samples/sec   Loss 4.5892   LearningRate 0.0095   Epoch: 13   Global Step: 172000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:00:22,890-Speed 3031.00 samples/sec   Loss 4.5168   LearningRate 0.0095   Epoch: 13   Global Step: 172010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:00:26,246-Speed 3051.98 samples/sec   Loss 4.5676   LearningRate 0.0095   Epoch: 13   Global Step: 172020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:00:29,629-Speed 3027.33 samples/sec   Loss 4.5959   LearningRate 0.0095   Epoch: 13   Global Step: 172030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:00:32,960-Speed 3074.85 samples/sec   Loss 4.5628   LearningRate 0.0095   Epoch: 13   Global Step: 172040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:00:36,414-Speed 2965.66 samples/sec   Loss 4.7159   LearningRate 0.0095   Epoch: 13   Global Step: 172050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:00:39,841-Speed 2989.05 samples/sec   Loss 4.5868   LearningRate 0.0094   Epoch: 13   Global Step: 172060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:00:43,213-Speed 3038.34 samples/sec   Loss 4.4951   LearningRate 0.0094   Epoch: 13   Global Step: 172070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:00:46,625-Speed 3001.65 samples/sec   Loss 4.5112   LearningRate 0.0094   Epoch: 13   Global Step: 172080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:00:49,969-Speed 3063.64 samples/sec   Loss 4.5934   LearningRate 0.0094   Epoch: 13   Global Step: 172090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:00:53,294-Speed 3080.59 samples/sec   Loss 4.6199   LearningRate 0.0094   Epoch: 13   Global Step: 172100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:00:56,715-Speed 2994.06 samples/sec   Loss 4.6145   LearningRate 0.0094   Epoch: 13   Global Step: 172110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:00,144-Speed 2987.16 samples/sec   Loss 4.5954   LearningRate 0.0094   Epoch: 13   Global Step: 172120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:03,604-Speed 2960.25 samples/sec   Loss 4.5702   LearningRate 0.0094   Epoch: 13   Global Step: 172130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:06,949-Speed 3062.18 samples/sec   Loss 4.4849   LearningRate 0.0094   Epoch: 13   Global Step: 172140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:10,355-Speed 3008.06 samples/sec   Loss 4.5591   LearningRate 0.0094   Epoch: 13   Global Step: 172150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:13,745-Speed 3020.58 samples/sec   Loss 4.5740   LearningRate 0.0094   Epoch: 13   Global Step: 172160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:17,208-Speed 2958.37 samples/sec   Loss 4.5334   LearningRate 0.0094   Epoch: 13   Global Step: 172170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:20,614-Speed 3007.38 samples/sec   Loss 4.6222   LearningRate 0.0094   Epoch: 13   Global Step: 172180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:24,054-Speed 2977.92 samples/sec   Loss 4.6609   LearningRate 0.0094   Epoch: 13   Global Step: 172190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:27,464-Speed 3003.79 samples/sec   Loss 4.5619   LearningRate 0.0094   Epoch: 13   Global Step: 172200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:31,038-Speed 2866.16 samples/sec   Loss 4.5079   LearningRate 0.0094   Epoch: 13   Global Step: 172210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:34,398-Speed 3048.79 samples/sec   Loss 4.6269   LearningRate 0.0094   Epoch: 13   Global Step: 172220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:37,747-Speed 3058.33 samples/sec   Loss 4.6023   LearningRate 0.0094   Epoch: 13   Global Step: 172230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:41,158-Speed 3003.42 samples/sec   Loss 4.5041   LearningRate 0.0094   Epoch: 13   Global Step: 172240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:44,633-Speed 2947.03 samples/sec   Loss 4.6197   LearningRate 0.0094   Epoch: 13   Global Step: 172250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:48,012-Speed 3031.65 samples/sec   Loss 4.5960   LearningRate 0.0094   Epoch: 13   Global Step: 172260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:51,359-Speed 3060.54 samples/sec   Loss 4.5828   LearningRate 0.0094   Epoch: 13   Global Step: 172270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:54,699-Speed 3066.89 samples/sec   Loss 4.5634   LearningRate 0.0094   Epoch: 13   Global Step: 172280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:01:58,115-Speed 2998.46 samples/sec   Loss 4.5434   LearningRate 0.0094   Epoch: 13   Global Step: 172290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:02:01,491-Speed 3033.44 samples/sec   Loss 4.5823   LearningRate 0.0094   Epoch: 13   Global Step: 172300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:02:04,845-Speed 3054.16 samples/sec   Loss 4.6175   LearningRate 0.0094   Epoch: 13   Global Step: 172310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:02:08,198-Speed 3054.89 samples/sec   Loss 4.4836   LearningRate 0.0094   Epoch: 13   Global Step: 172320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:02:11,555-Speed 3051.40 samples/sec   Loss 4.5355   LearningRate 0.0094   Epoch: 13   Global Step: 172330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:02:14,910-Speed 3053.29 samples/sec   Loss 4.6161   LearningRate 0.0094   Epoch: 13   Global Step: 172340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:02:18,312-Speed 3010.37 samples/sec   Loss 4.5450   LearningRate 0.0094   Epoch: 13   Global Step: 172350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:02:21,674-Speed 3046.74 samples/sec   Loss 4.5673   LearningRate 0.0094   Epoch: 13   Global Step: 172360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:02:25,030-Speed 3052.20 samples/sec   Loss 4.6223   LearningRate 0.0094   Epoch: 13   Global Step: 172370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:02:28,455-Speed 2990.60 samples/sec   Loss 4.5394   LearningRate 0.0094   Epoch: 13   Global Step: 172380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:02:31,818-Speed 3046.49 samples/sec   Loss 4.6193   LearningRate 0.0094   Epoch: 13   Global Step: 172390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 18:02:35,232-Speed 3000.35 samples/sec   Loss 4.5785   LearningRate 0.0094   Epoch: 13   Global Step: 172400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:02:38,707-Speed 2947.03 samples/sec   Loss 4.5932   LearningRate 0.0094   Epoch: 13   Global Step: 172410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:02:42,200-Speed 2932.76 samples/sec   Loss 4.6483   LearningRate 0.0094   Epoch: 13   Global Step: 172420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:02:45,644-Speed 2973.89 samples/sec   Loss 4.6140   LearningRate 0.0094   Epoch: 13   Global Step: 172430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:02:49,081-Speed 2980.12 samples/sec   Loss 4.4759   LearningRate 0.0094   Epoch: 13   Global Step: 172440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:02:52,535-Speed 2966.04 samples/sec   Loss 4.6397   LearningRate 0.0094   Epoch: 13   Global Step: 172450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:02:55,944-Speed 3003.99 samples/sec   Loss 4.5835   LearningRate 0.0093   Epoch: 13   Global Step: 172460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:02:59,313-Speed 3041.08 samples/sec   Loss 4.5299   LearningRate 0.0093   Epoch: 13   Global Step: 172470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:03:02,730-Speed 2997.74 samples/sec   Loss 4.6266   LearningRate 0.0093   Epoch: 13   Global Step: 172480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:03:06,097-Speed 3042.28 samples/sec   Loss 4.5957   LearningRate 0.0093   Epoch: 13   Global Step: 172490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:03:09,516-Speed 2996.30 samples/sec   Loss 4.5375   LearningRate 0.0093   Epoch: 13   Global Step: 172500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:03:12,985-Speed 2952.24 samples/sec   Loss 4.5755   LearningRate 0.0093   Epoch: 13   Global Step: 172510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:03:16,339-Speed 3053.74 samples/sec   Loss 4.5161   LearningRate 0.0093   Epoch: 13   Global Step: 172520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:03:19,676-Speed 3069.74 samples/sec   Loss 4.6087   LearningRate 0.0093   Epoch: 13   Global Step: 172530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:03:23,057-Speed 3029.35 samples/sec   Loss 4.5261   LearningRate 0.0093   Epoch: 13   Global Step: 172540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:03:26,415-Speed 3050.62 samples/sec   Loss 4.4442   LearningRate 0.0093   Epoch: 13   Global Step: 172550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:03:29,801-Speed 3025.09 samples/sec   Loss 4.5070   LearningRate 0.0093   Epoch: 13   Global Step: 172560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:03:33,199-Speed 3014.44 samples/sec   Loss 4.5352   LearningRate 0.0093   Epoch: 13   Global Step: 172570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:03:36,535-Speed 3069.85 samples/sec   Loss 4.5879   LearningRate 0.0093   Epoch: 13   Global Step: 172580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:03:39,947-Speed 3002.95 samples/sec   Loss 4.4983   LearningRate 0.0093   Epoch: 13   Global Step: 172590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:03:43,358-Speed 3003.23 samples/sec   Loss 4.4844   LearningRate 0.0093   Epoch: 13   Global Step: 172600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:03:46,686-Speed 3077.11 samples/sec   Loss 4.5360   LearningRate 0.0093   Epoch: 13   Global Step: 172610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:03:50,056-Speed 3040.22 samples/sec   Loss 4.4313   LearningRate 0.0093   Epoch: 13   Global Step: 172620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:03:53,428-Speed 3037.62 samples/sec   Loss 4.5941   LearningRate 0.0093   Epoch: 13   Global Step: 172630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:03:56,773-Speed 3061.27 samples/sec   Loss 4.6506   LearningRate 0.0093   Epoch: 13   Global Step: 172640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:04:00,278-Speed 2922.66 samples/sec   Loss 4.4813   LearningRate 0.0093   Epoch: 13   Global Step: 172650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:04:03,757-Speed 2944.51 samples/sec   Loss 4.6032   LearningRate 0.0093   Epoch: 13   Global Step: 172660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:04:07,186-Speed 2986.41 samples/sec   Loss 4.5598   LearningRate 0.0093   Epoch: 13   Global Step: 172670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:04:10,549-Speed 3046.17 samples/sec   Loss 4.5996   LearningRate 0.0093   Epoch: 13   Global Step: 172680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:04:13,937-Speed 3023.71 samples/sec   Loss 4.4634   LearningRate 0.0093   Epoch: 13   Global Step: 172690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:04:17,360-Speed 2991.85 samples/sec   Loss 4.5287   LearningRate 0.0093   Epoch: 13   Global Step: 172700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:04:20,728-Speed 3041.36 samples/sec   Loss 4.5571   LearningRate 0.0093   Epoch: 13   Global Step: 172710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:04:24,055-Speed 3078.23 samples/sec   Loss 4.4660   LearningRate 0.0093   Epoch: 13   Global Step: 172720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:04:27,443-Speed 3023.31 samples/sec   Loss 4.4898   LearningRate 0.0093   Epoch: 13   Global Step: 172730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:04:30,788-Speed 3062.37 samples/sec   Loss 4.4745   LearningRate 0.0093   Epoch: 13   Global Step: 172740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:04:34,214-Speed 2991.02 samples/sec   Loss 4.5428   LearningRate 0.0093   Epoch: 13   Global Step: 172750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:04:37,575-Speed 3047.93 samples/sec   Loss 4.5478   LearningRate 0.0093   Epoch: 13   Global Step: 172760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:04:40,991-Speed 2998.17 samples/sec   Loss 4.6438   LearningRate 0.0093   Epoch: 13   Global Step: 172770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:04:44,374-Speed 3027.76 samples/sec   Loss 4.6117   LearningRate 0.0093   Epoch: 13   Global Step: 172780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:04:47,812-Speed 2979.30 samples/sec   Loss 4.4416   LearningRate 0.0093   Epoch: 13   Global Step: 172790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:04:51,280-Speed 2953.85 samples/sec   Loss 4.5951   LearningRate 0.0093   Epoch: 13   Global Step: 172800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:04:54,682-Speed 3011.12 samples/sec   Loss 4.4259   LearningRate 0.0093   Epoch: 13   Global Step: 172810   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:04:58,057-Speed 3034.10 samples/sec   Loss 4.4877   LearningRate 0.0093   Epoch: 13   Global Step: 172820   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:05:01,429-Speed 3038.11 samples/sec   Loss 4.4870   LearningRate 0.0093   Epoch: 13   Global Step: 172830   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:05:04,746-Speed 3088.01 samples/sec   Loss 4.5067   LearningRate 0.0093   Epoch: 13   Global Step: 172840   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:05:08,155-Speed 3005.21 samples/sec   Loss 4.5731   LearningRate 0.0093   Epoch: 13   Global Step: 172850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:05:11,571-Speed 2997.89 samples/sec   Loss 4.4731   LearningRate 0.0093   Epoch: 13   Global Step: 172860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:05:14,925-Speed 3054.33 samples/sec   Loss 4.6056   LearningRate 0.0092   Epoch: 13   Global Step: 172870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:05:18,286-Speed 3047.73 samples/sec   Loss 4.5765   LearningRate 0.0092   Epoch: 13   Global Step: 172880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:05:21,641-Speed 3052.73 samples/sec   Loss 4.5387   LearningRate 0.0092   Epoch: 13   Global Step: 172890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:05:25,046-Speed 3008.84 samples/sec   Loss 4.6425   LearningRate 0.0092   Epoch: 13   Global Step: 172900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:05:28,404-Speed 3049.63 samples/sec   Loss 4.5067   LearningRate 0.0092   Epoch: 13   Global Step: 172910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:05:31,778-Speed 3036.16 samples/sec   Loss 4.4243   LearningRate 0.0092   Epoch: 13   Global Step: 172920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:05:35,128-Speed 3057.33 samples/sec   Loss 4.4245   LearningRate 0.0092   Epoch: 13   Global Step: 172930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:05:38,497-Speed 3040.53 samples/sec   Loss 4.6102   LearningRate 0.0092   Epoch: 13   Global Step: 172940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:05:41,897-Speed 3012.76 samples/sec   Loss 4.5483   LearningRate 0.0092   Epoch: 13   Global Step: 172950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:05:45,252-Speed 3052.95 samples/sec   Loss 4.4271   LearningRate 0.0092   Epoch: 13   Global Step: 172960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:05:48,689-Speed 2980.21 samples/sec   Loss 4.6000   LearningRate 0.0092   Epoch: 13   Global Step: 172970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:05:52,021-Speed 3074.38 samples/sec   Loss 4.4857   LearningRate 0.0092   Epoch: 13   Global Step: 172980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:05:55,391-Speed 3039.37 samples/sec   Loss 4.4639   LearningRate 0.0092   Epoch: 13   Global Step: 172990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:05:58,766-Speed 3034.61 samples/sec   Loss 4.6734   LearningRate 0.0092   Epoch: 13   Global Step: 173000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:06:02,096-Speed 3076.44 samples/sec   Loss 4.5923   LearningRate 0.0092   Epoch: 13   Global Step: 173010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:05,458-Speed 3046.46 samples/sec   Loss 4.4956   LearningRate 0.0092   Epoch: 13   Global Step: 173020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:08,836-Speed 3032.43 samples/sec   Loss 4.6191   LearningRate 0.0092   Epoch: 13   Global Step: 173030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:12,241-Speed 3007.80 samples/sec   Loss 4.5035   LearningRate 0.0092   Epoch: 13   Global Step: 173040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:15,641-Speed 3013.36 samples/sec   Loss 4.6390   LearningRate 0.0092   Epoch: 13   Global Step: 173050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:19,065-Speed 2991.50 samples/sec   Loss 4.5430   LearningRate 0.0092   Epoch: 13   Global Step: 173060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:22,458-Speed 3018.61 samples/sec   Loss 4.4948   LearningRate 0.0092   Epoch: 13   Global Step: 173070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:25,911-Speed 2966.52 samples/sec   Loss 4.5622   LearningRate 0.0092   Epoch: 13   Global Step: 173080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:29,255-Speed 3062.50 samples/sec   Loss 4.4386   LearningRate 0.0092   Epoch: 13   Global Step: 173090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:32,619-Speed 3045.12 samples/sec   Loss 4.5883   LearningRate 0.0092   Epoch: 13   Global Step: 173100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:36,003-Speed 3026.78 samples/sec   Loss 4.5556   LearningRate 0.0092   Epoch: 13   Global Step: 173110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:39,337-Speed 3072.59 samples/sec   Loss 4.5144   LearningRate 0.0092   Epoch: 13   Global Step: 173120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:42,675-Speed 3068.96 samples/sec   Loss 4.5160   LearningRate 0.0092   Epoch: 13   Global Step: 173130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:46,088-Speed 3001.56 samples/sec   Loss 4.5019   LearningRate 0.0092   Epoch: 13   Global Step: 173140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:49,435-Speed 3060.17 samples/sec   Loss 4.5021   LearningRate 0.0092   Epoch: 13   Global Step: 173150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:06:52,794-Speed 3049.94 samples/sec   Loss 4.4750   LearningRate 0.0092   Epoch: 13   Global Step: 173160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:06:56,132-Speed 3068.51 samples/sec   Loss 4.5923   LearningRate 0.0092   Epoch: 13   Global Step: 173170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:06:59,502-Speed 3039.45 samples/sec   Loss 4.4426   LearningRate 0.0092   Epoch: 13   Global Step: 173180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:07:02,908-Speed 3006.70 samples/sec   Loss 4.5044   LearningRate 0.0092   Epoch: 13   Global Step: 173190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:07:06,272-Speed 3045.26 samples/sec   Loss 4.5675   LearningRate 0.0092   Epoch: 13   Global Step: 173200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:07:09,729-Speed 2962.56 samples/sec   Loss 4.5509   LearningRate 0.0092   Epoch: 13   Global Step: 173210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:07:13,071-Speed 3065.05 samples/sec   Loss 4.5171   LearningRate 0.0092   Epoch: 13   Global Step: 173220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:07:16,432-Speed 3047.89 samples/sec   Loss 4.4408   LearningRate 0.0092   Epoch: 13   Global Step: 173230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:07:19,841-Speed 3004.40 samples/sec   Loss 4.4880   LearningRate 0.0092   Epoch: 13   Global Step: 173240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:07:23,222-Speed 3029.70 samples/sec   Loss 4.5680   LearningRate 0.0092   Epoch: 13   Global Step: 173250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:07:26,690-Speed 2953.28 samples/sec   Loss 4.5205   LearningRate 0.0092   Epoch: 13   Global Step: 173260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:07:30,176-Speed 2938.77 samples/sec   Loss 4.5457   LearningRate 0.0092   Epoch: 13   Global Step: 173270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:07:33,653-Speed 2946.00 samples/sec   Loss 4.5768   LearningRate 0.0091   Epoch: 13   Global Step: 173280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:07:37,087-Speed 2982.45 samples/sec   Loss 4.5797   LearningRate 0.0091   Epoch: 13   Global Step: 173290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:07:40,457-Speed 3039.07 samples/sec   Loss 4.5520   LearningRate 0.0091   Epoch: 13   Global Step: 173300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:07:43,832-Speed 3035.51 samples/sec   Loss 4.5307   LearningRate 0.0091   Epoch: 13   Global Step: 173310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:07:47,233-Speed 3011.71 samples/sec   Loss 4.5691   LearningRate 0.0091   Epoch: 13   Global Step: 173320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:07:50,617-Speed 3026.81 samples/sec   Loss 4.4582   LearningRate 0.0091   Epoch: 13   Global Step: 173330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:07:54,011-Speed 3017.78 samples/sec   Loss 4.5089   LearningRate 0.0091   Epoch: 13   Global Step: 173340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:07:57,444-Speed 2983.91 samples/sec   Loss 4.5991   LearningRate 0.0091   Epoch: 13   Global Step: 173350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:08:00,766-Speed 3083.28 samples/sec   Loss 4.4692   LearningRate 0.0091   Epoch: 13   Global Step: 173360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:08:04,143-Speed 3033.36 samples/sec   Loss 4.5550   LearningRate 0.0091   Epoch: 13   Global Step: 173370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:08:07,473-Speed 3075.27 samples/sec   Loss 4.5582   LearningRate 0.0091   Epoch: 13   Global Step: 173380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:08:10,842-Speed 3041.17 samples/sec   Loss 4.6458   LearningRate 0.0091   Epoch: 13   Global Step: 173390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:08:14,185-Speed 3063.26 samples/sec   Loss 4.5256   LearningRate 0.0091   Epoch: 13   Global Step: 173400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:08:17,498-Speed 3091.78 samples/sec   Loss 4.5430   LearningRate 0.0091   Epoch: 13   Global Step: 173410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:08:20,845-Speed 3060.41 samples/sec   Loss 4.5415   LearningRate 0.0091   Epoch: 13   Global Step: 173420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:08:24,196-Speed 3057.27 samples/sec   Loss 4.4871   LearningRate 0.0091   Epoch: 13   Global Step: 173430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:08:27,543-Speed 3060.27 samples/sec   Loss 4.5161   LearningRate 0.0091   Epoch: 13   Global Step: 173440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:08:30,958-Speed 2999.65 samples/sec   Loss 4.5820   LearningRate 0.0091   Epoch: 13   Global Step: 173450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:08:34,350-Speed 3019.00 samples/sec   Loss 4.4165   LearningRate 0.0091   Epoch: 13   Global Step: 173460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:08:37,751-Speed 3012.45 samples/sec   Loss 4.4761   LearningRate 0.0091   Epoch: 13   Global Step: 173470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:08:41,206-Speed 2964.71 samples/sec   Loss 4.5315   LearningRate 0.0091   Epoch: 13   Global Step: 173480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:08:44,656-Speed 2968.41 samples/sec   Loss 4.4547   LearningRate 0.0091   Epoch: 13   Global Step: 173490   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:08:48,091-Speed 2982.64 samples/sec   Loss 4.5799   LearningRate 0.0091   Epoch: 13   Global Step: 173500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:08:51,417-Speed 3079.70 samples/sec   Loss 4.6797   LearningRate 0.0091   Epoch: 13   Global Step: 173510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:08:54,816-Speed 3013.43 samples/sec   Loss 4.5538   LearningRate 0.0091   Epoch: 13   Global Step: 173520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:08:58,219-Speed 3010.21 samples/sec   Loss 4.6321   LearningRate 0.0091   Epoch: 13   Global Step: 173530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:09:01,507-Speed 3114.95 samples/sec   Loss 4.5104   LearningRate 0.0091   Epoch: 13   Global Step: 173540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:09:04,822-Speed 3089.79 samples/sec   Loss 4.6263   LearningRate 0.0091   Epoch: 13   Global Step: 173550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:09:08,220-Speed 3013.80 samples/sec   Loss 4.6034   LearningRate 0.0091   Epoch: 13   Global Step: 173560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:09:11,552-Speed 3074.82 samples/sec   Loss 4.5887   LearningRate 0.0091   Epoch: 13   Global Step: 173570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:09:14,986-Speed 2982.08 samples/sec   Loss 4.5084   LearningRate 0.0091   Epoch: 13   Global Step: 173580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:09:18,304-Speed 3088.10 samples/sec   Loss 4.5629   LearningRate 0.0091   Epoch: 13   Global Step: 173590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:09:21,690-Speed 3024.17 samples/sec   Loss 4.5451   LearningRate 0.0091   Epoch: 13   Global Step: 173600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:09:25,068-Speed 3032.95 samples/sec   Loss 4.4561   LearningRate 0.0091   Epoch: 13   Global Step: 173610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:09:28,478-Speed 3003.18 samples/sec   Loss 4.5084   LearningRate 0.0091   Epoch: 13   Global Step: 173620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:09:31,913-Speed 2982.17 samples/sec   Loss 4.4731   LearningRate 0.0091   Epoch: 13   Global Step: 173630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:09:35,263-Speed 3058.21 samples/sec   Loss 4.4850   LearningRate 0.0091   Epoch: 13   Global Step: 173640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:09:38,752-Speed 2935.91 samples/sec   Loss 4.5012   LearningRate 0.0091   Epoch: 13   Global Step: 173650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:09:42,090-Speed 3067.78 samples/sec   Loss 4.5170   LearningRate 0.0091   Epoch: 13   Global Step: 173660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:09:45,486-Speed 3017.29 samples/sec   Loss 4.6095   LearningRate 0.0091   Epoch: 13   Global Step: 173670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:09:48,924-Speed 2979.12 samples/sec   Loss 4.4989   LearningRate 0.0091   Epoch: 13   Global Step: 173680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:09:52,336-Speed 3001.37 samples/sec   Loss 4.5753   LearningRate 0.0090   Epoch: 13   Global Step: 173690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:09:55,675-Speed 3067.29 samples/sec   Loss 4.5065   LearningRate 0.0090   Epoch: 13   Global Step: 173700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:09:59,119-Speed 2974.20 samples/sec   Loss 4.5010   LearningRate 0.0090   Epoch: 13   Global Step: 173710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:10:02,524-Speed 3008.73 samples/sec   Loss 4.5557   LearningRate 0.0090   Epoch: 13   Global Step: 173720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:10:05,934-Speed 3003.80 samples/sec   Loss 4.5137   LearningRate 0.0090   Epoch: 13   Global Step: 173730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:10:09,268-Speed 3072.38 samples/sec   Loss 4.4380   LearningRate 0.0090   Epoch: 13   Global Step: 173740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:10:12,751-Speed 2940.69 samples/sec   Loss 4.5861   LearningRate 0.0090   Epoch: 13   Global Step: 173750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:10:16,206-Speed 2964.93 samples/sec   Loss 4.5157   LearningRate 0.0090   Epoch: 13   Global Step: 173760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:10:19,610-Speed 3009.25 samples/sec   Loss 4.5658   LearningRate 0.0090   Epoch: 13   Global Step: 173770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:10:23,031-Speed 2994.17 samples/sec   Loss 4.5265   LearningRate 0.0090   Epoch: 13   Global Step: 173780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:10:26,404-Speed 3036.66 samples/sec   Loss 4.5696   LearningRate 0.0090   Epoch: 13   Global Step: 173790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:10:29,890-Speed 2938.42 samples/sec   Loss 4.4814   LearningRate 0.0090   Epoch: 13   Global Step: 173800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:10:33,231-Speed 3065.61 samples/sec   Loss 4.3976   LearningRate 0.0090   Epoch: 13   Global Step: 173810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:10:36,610-Speed 3030.86 samples/sec   Loss 4.5782   LearningRate 0.0090   Epoch: 13   Global Step: 173820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:10:39,964-Speed 3054.22 samples/sec   Loss 4.4982   LearningRate 0.0090   Epoch: 13   Global Step: 173830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:10:43,395-Speed 2985.96 samples/sec   Loss 4.4638   LearningRate 0.0090   Epoch: 13   Global Step: 173840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:10:46,739-Speed 3063.10 samples/sec   Loss 4.5618   LearningRate 0.0090   Epoch: 13   Global Step: 173850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:10:50,143-Speed 3008.97 samples/sec   Loss 4.3983   LearningRate 0.0090   Epoch: 13   Global Step: 173860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:10:53,521-Speed 3032.36 samples/sec   Loss 4.5171   LearningRate 0.0090   Epoch: 13   Global Step: 173870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:10:56,856-Speed 3070.93 samples/sec   Loss 4.5482   LearningRate 0.0090   Epoch: 13   Global Step: 173880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:11:00,586-Speed 2746.41 samples/sec   Loss 4.4345   LearningRate 0.0090   Epoch: 13   Global Step: 173890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:11:33,222-Speed 313.78 samples/sec   Loss 3.7024   LearningRate 0.0090   Epoch: 14   Global Step: 173900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:11:36,761-Speed 2894.14 samples/sec   Loss 3.1546   LearningRate 0.0090   Epoch: 14   Global Step: 173910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:11:40,259-Speed 2928.07 samples/sec   Loss 3.1538   LearningRate 0.0090   Epoch: 14   Global Step: 173920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:11:43,635-Speed 3034.22 samples/sec   Loss 3.1666   LearningRate 0.0090   Epoch: 14   Global Step: 173930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:11:47,044-Speed 3004.98 samples/sec   Loss 3.2521   LearningRate 0.0090   Epoch: 14   Global Step: 173940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:11:50,457-Speed 3000.28 samples/sec   Loss 3.1619   LearningRate 0.0090   Epoch: 14   Global Step: 173950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:11:53,831-Speed 3035.86 samples/sec   Loss 3.3285   LearningRate 0.0090   Epoch: 14   Global Step: 173960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:11:57,196-Speed 3051.97 samples/sec   Loss 3.1538   LearningRate 0.0090   Epoch: 14   Global Step: 173970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:12:00,578-Speed 3028.35 samples/sec   Loss 3.2859   LearningRate 0.0090   Epoch: 14   Global Step: 173980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:12:03,938-Speed 3049.02 samples/sec   Loss 3.1419   LearningRate 0.0090   Epoch: 14   Global Step: 173990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:12:07,252-Speed 3090.71 samples/sec   Loss 3.1603   LearningRate 0.0090   Epoch: 14   Global Step: 174000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:12:10,706-Speed 2965.46 samples/sec   Loss 3.2215   LearningRate 0.0090   Epoch: 14   Global Step: 174010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:12:14,113-Speed 3006.43 samples/sec   Loss 3.2194   LearningRate 0.0090   Epoch: 14   Global Step: 174020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:12:17,454-Speed 3065.97 samples/sec   Loss 3.1945   LearningRate 0.0090   Epoch: 14   Global Step: 174030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:12:20,764-Speed 3094.55 samples/sec   Loss 3.2307   LearningRate 0.0090   Epoch: 14   Global Step: 174040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:12:24,088-Speed 3082.01 samples/sec   Loss 3.1493   LearningRate 0.0090   Epoch: 14   Global Step: 174050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:12:27,391-Speed 3101.33 samples/sec   Loss 3.1606   LearningRate 0.0090   Epoch: 14   Global Step: 174060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:12:30,823-Speed 2984.78 samples/sec   Loss 3.2591   LearningRate 0.0090   Epoch: 14   Global Step: 174070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:12:34,178-Speed 3052.49 samples/sec   Loss 3.2395   LearningRate 0.0090   Epoch: 14   Global Step: 174080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:12:37,474-Speed 3107.82 samples/sec   Loss 3.2368   LearningRate 0.0090   Epoch: 14   Global Step: 174090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:12:40,908-Speed 2983.04 samples/sec   Loss 3.2291   LearningRate 0.0090   Epoch: 14   Global Step: 174100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:12:44,362-Speed 2965.21 samples/sec   Loss 3.1594   LearningRate 0.0089   Epoch: 14   Global Step: 174110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:12:47,770-Speed 3005.64 samples/sec   Loss 3.1427   LearningRate 0.0089   Epoch: 14   Global Step: 174120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:12:51,318-Speed 2887.58 samples/sec   Loss 3.2266   LearningRate 0.0089   Epoch: 14   Global Step: 174130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:12:54,730-Speed 3001.72 samples/sec   Loss 3.1891   LearningRate 0.0089   Epoch: 14   Global Step: 174140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:12:58,135-Speed 3007.76 samples/sec   Loss 3.1702   LearningRate 0.0089   Epoch: 14   Global Step: 174150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:13:01,751-Speed 2833.02 samples/sec   Loss 3.1403   LearningRate 0.0089   Epoch: 14   Global Step: 174160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:13:05,189-Speed 2979.50 samples/sec   Loss 3.2608   LearningRate 0.0089   Epoch: 14   Global Step: 174170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:13:08,626-Speed 2980.00 samples/sec   Loss 3.1745   LearningRate 0.0089   Epoch: 14   Global Step: 174180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:13:12,073-Speed 2971.75 samples/sec   Loss 3.2314   LearningRate 0.0089   Epoch: 14   Global Step: 174190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:13:15,554-Speed 2942.80 samples/sec   Loss 3.2309   LearningRate 0.0089   Epoch: 14   Global Step: 174200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:13:18,952-Speed 3014.35 samples/sec   Loss 3.2731   LearningRate 0.0089   Epoch: 14   Global Step: 174210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:13:22,328-Speed 3034.44 samples/sec   Loss 3.1329   LearningRate 0.0089   Epoch: 14   Global Step: 174220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:13:25,783-Speed 2964.19 samples/sec   Loss 3.2533   LearningRate 0.0089   Epoch: 14   Global Step: 174230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:13:29,224-Speed 2976.75 samples/sec   Loss 3.2234   LearningRate 0.0089   Epoch: 14   Global Step: 174240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:13:32,630-Speed 3008.70 samples/sec   Loss 3.2932   LearningRate 0.0089   Epoch: 14   Global Step: 174250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:13:36,040-Speed 3003.90 samples/sec   Loss 3.2671   LearningRate 0.0089   Epoch: 14   Global Step: 174260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:13:39,414-Speed 3036.62 samples/sec   Loss 3.2361   LearningRate 0.0089   Epoch: 14   Global Step: 174270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:13:42,878-Speed 2956.27 samples/sec   Loss 3.1984   LearningRate 0.0089   Epoch: 14   Global Step: 174280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:13:46,211-Speed 3073.20 samples/sec   Loss 3.2504   LearningRate 0.0089   Epoch: 14   Global Step: 174290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:13:49,614-Speed 3010.04 samples/sec   Loss 3.2350   LearningRate 0.0089   Epoch: 14   Global Step: 174300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:13:53,023-Speed 3004.67 samples/sec   Loss 3.2475   LearningRate 0.0089   Epoch: 14   Global Step: 174310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:13:56,479-Speed 2964.31 samples/sec   Loss 3.2691   LearningRate 0.0089   Epoch: 14   Global Step: 174320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:13:59,839-Speed 3048.04 samples/sec   Loss 3.1281   LearningRate 0.0089   Epoch: 14   Global Step: 174330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:14:03,222-Speed 3027.67 samples/sec   Loss 3.2252   LearningRate 0.0089   Epoch: 14   Global Step: 174340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:14:06,589-Speed 3041.82 samples/sec   Loss 3.2053   LearningRate 0.0089   Epoch: 14   Global Step: 174350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:14:09,974-Speed 3026.61 samples/sec   Loss 3.1899   LearningRate 0.0089   Epoch: 14   Global Step: 174360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:14:13,359-Speed 3025.56 samples/sec   Loss 3.2042   LearningRate 0.0089   Epoch: 14   Global Step: 174370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:14:16,754-Speed 3017.31 samples/sec   Loss 3.2520   LearningRate 0.0089   Epoch: 14   Global Step: 174380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:14:20,110-Speed 3051.83 samples/sec   Loss 3.2278   LearningRate 0.0089   Epoch: 14   Global Step: 174390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:14:23,552-Speed 2975.92 samples/sec   Loss 3.2312   LearningRate 0.0089   Epoch: 14   Global Step: 174400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:14:26,921-Speed 3040.20 samples/sec   Loss 3.2052   LearningRate 0.0089   Epoch: 14   Global Step: 174410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:14:30,247-Speed 3080.18 samples/sec   Loss 3.3202   LearningRate 0.0089   Epoch: 14   Global Step: 174420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:14:33,628-Speed 3029.79 samples/sec   Loss 3.2510   LearningRate 0.0089   Epoch: 14   Global Step: 174430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:14:36,991-Speed 3045.30 samples/sec   Loss 3.2714   LearningRate 0.0089   Epoch: 14   Global Step: 174440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:14:40,343-Speed 3056.23 samples/sec   Loss 3.3764   LearningRate 0.0089   Epoch: 14   Global Step: 174450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:14:43,743-Speed 3012.44 samples/sec   Loss 3.2234   LearningRate 0.0089   Epoch: 14   Global Step: 174460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:14:47,207-Speed 2957.00 samples/sec   Loss 3.3707   LearningRate 0.0089   Epoch: 14   Global Step: 174470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:14:50,611-Speed 3009.41 samples/sec   Loss 3.3143   LearningRate 0.0089   Epoch: 14   Global Step: 174480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:14:53,995-Speed 3026.61 samples/sec   Loss 3.3000   LearningRate 0.0089   Epoch: 14   Global Step: 174490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:14:57,346-Speed 3056.57 samples/sec   Loss 3.2785   LearningRate 0.0089   Epoch: 14   Global Step: 174500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:15:00,688-Speed 3065.00 samples/sec   Loss 3.2938   LearningRate 0.0089   Epoch: 14   Global Step: 174510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:15:04,057-Speed 3040.69 samples/sec   Loss 3.2210   LearningRate 0.0088   Epoch: 14   Global Step: 174520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:15:07,499-Speed 2975.20 samples/sec   Loss 3.2883   LearningRate 0.0088   Epoch: 14   Global Step: 174530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:15:10,892-Speed 3019.88 samples/sec   Loss 3.3325   LearningRate 0.0088   Epoch: 14   Global Step: 174540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:15:14,272-Speed 3030.00 samples/sec   Loss 3.3468   LearningRate 0.0088   Epoch: 14   Global Step: 174550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:15:17,764-Speed 2933.21 samples/sec   Loss 3.3311   LearningRate 0.0088   Epoch: 14   Global Step: 174560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:15:21,260-Speed 2929.65 samples/sec   Loss 3.2375   LearningRate 0.0088   Epoch: 14   Global Step: 174570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:15:24,669-Speed 3005.17 samples/sec   Loss 3.3462   LearningRate 0.0088   Epoch: 14   Global Step: 174580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:15:28,171-Speed 2925.16 samples/sec   Loss 3.3751   LearningRate 0.0088   Epoch: 14   Global Step: 174590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:15:31,589-Speed 2996.79 samples/sec   Loss 3.2165   LearningRate 0.0088   Epoch: 14   Global Step: 174600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:15:35,010-Speed 2993.60 samples/sec   Loss 3.4115   LearningRate 0.0088   Epoch: 14   Global Step: 174610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:15:38,448-Speed 2979.24 samples/sec   Loss 3.3624   LearningRate 0.0088   Epoch: 14   Global Step: 174620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:15:41,821-Speed 3037.20 samples/sec   Loss 3.3192   LearningRate 0.0088   Epoch: 14   Global Step: 174630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:15:45,245-Speed 2991.56 samples/sec   Loss 3.3591   LearningRate 0.0088   Epoch: 14   Global Step: 174640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:15:48,654-Speed 3003.94 samples/sec   Loss 3.3462   LearningRate 0.0088   Epoch: 14   Global Step: 174650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:15:52,049-Speed 3017.63 samples/sec   Loss 3.3692   LearningRate 0.0088   Epoch: 14   Global Step: 174660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:15:55,469-Speed 2994.23 samples/sec   Loss 3.2942   LearningRate 0.0088   Epoch: 14   Global Step: 174670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:15:58,891-Speed 2994.06 samples/sec   Loss 3.1691   LearningRate 0.0088   Epoch: 14   Global Step: 174680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:16:02,243-Speed 3055.17 samples/sec   Loss 3.3777   LearningRate 0.0088   Epoch: 14   Global Step: 174690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:16:05,647-Speed 3009.21 samples/sec   Loss 3.3656   LearningRate 0.0088   Epoch: 14   Global Step: 174700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:16:09,019-Speed 3038.04 samples/sec   Loss 3.3661   LearningRate 0.0088   Epoch: 14   Global Step: 174710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:16:12,403-Speed 3026.60 samples/sec   Loss 3.3578   LearningRate 0.0088   Epoch: 14   Global Step: 174720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:16:15,779-Speed 3033.83 samples/sec   Loss 3.3584   LearningRate 0.0088   Epoch: 14   Global Step: 174730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:16:19,210-Speed 2986.16 samples/sec   Loss 3.3715   LearningRate 0.0088   Epoch: 14   Global Step: 174740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:16:22,606-Speed 3015.89 samples/sec   Loss 3.3234   LearningRate 0.0088   Epoch: 14   Global Step: 174750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 18:16:25,994-Speed 3023.39 samples/sec   Loss 3.3336   LearningRate 0.0088   Epoch: 14   Global Step: 174760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:16:29,341-Speed 3059.96 samples/sec   Loss 3.4362   LearningRate 0.0088   Epoch: 14   Global Step: 174770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:16:32,726-Speed 3026.17 samples/sec   Loss 3.3184   LearningRate 0.0088   Epoch: 14   Global Step: 174780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:16:36,085-Speed 3049.56 samples/sec   Loss 3.3025   LearningRate 0.0088   Epoch: 14   Global Step: 174790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:16:39,446-Speed 3047.24 samples/sec   Loss 3.3760   LearningRate 0.0088   Epoch: 14   Global Step: 174800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:16:42,763-Speed 3088.06 samples/sec   Loss 3.3511   LearningRate 0.0088   Epoch: 14   Global Step: 174810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:16:46,154-Speed 3020.46 samples/sec   Loss 3.2973   LearningRate 0.0088   Epoch: 14   Global Step: 174820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:16:49,619-Speed 2956.41 samples/sec   Loss 3.3943   LearningRate 0.0088   Epoch: 14   Global Step: 174830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:16:52,991-Speed 3037.16 samples/sec   Loss 3.3453   LearningRate 0.0088   Epoch: 14   Global Step: 174840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:16:56,512-Speed 2910.03 samples/sec   Loss 3.4114   LearningRate 0.0088   Epoch: 14   Global Step: 174850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:16:59,868-Speed 3051.63 samples/sec   Loss 3.4121   LearningRate 0.0088   Epoch: 14   Global Step: 174860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:17:03,250-Speed 3028.76 samples/sec   Loss 3.3954   LearningRate 0.0088   Epoch: 14   Global Step: 174870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:17:06,622-Speed 3037.40 samples/sec   Loss 3.3864   LearningRate 0.0088   Epoch: 14   Global Step: 174880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:17:09,980-Speed 3050.46 samples/sec   Loss 3.3597   LearningRate 0.0088   Epoch: 14   Global Step: 174890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:17:13,407-Speed 2989.29 samples/sec   Loss 3.3520   LearningRate 0.0088   Epoch: 14   Global Step: 174900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:17:16,759-Speed 3055.34 samples/sec   Loss 3.4080   LearningRate 0.0088   Epoch: 14   Global Step: 174910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:17:20,194-Speed 2982.18 samples/sec   Loss 3.3298   LearningRate 0.0088   Epoch: 14   Global Step: 174920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:17:23,621-Speed 2989.07 samples/sec   Loss 3.3758   LearningRate 0.0088   Epoch: 14   Global Step: 174930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:17:26,942-Speed 3084.50 samples/sec   Loss 3.3694   LearningRate 0.0087   Epoch: 14   Global Step: 174940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:17:30,317-Speed 3035.05 samples/sec   Loss 3.3624   LearningRate 0.0087   Epoch: 14   Global Step: 174950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:17:33,806-Speed 2935.21 samples/sec   Loss 3.3866   LearningRate 0.0087   Epoch: 14   Global Step: 174960   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:17:37,231-Speed 2990.58 samples/sec   Loss 3.3426   LearningRate 0.0087   Epoch: 14   Global Step: 174970   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:17:40,564-Speed 3073.62 samples/sec   Loss 3.3801   LearningRate 0.0087   Epoch: 14   Global Step: 174980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:17:43,863-Speed 3104.44 samples/sec   Loss 3.3983   LearningRate 0.0087   Epoch: 14   Global Step: 174990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:17:47,330-Speed 2954.84 samples/sec   Loss 3.3546   LearningRate 0.0087   Epoch: 14   Global Step: 175000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:17:50,745-Speed 2998.81 samples/sec   Loss 3.3771   LearningRate 0.0087   Epoch: 14   Global Step: 175010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:17:54,142-Speed 3016.09 samples/sec   Loss 3.3378   LearningRate 0.0087   Epoch: 14   Global Step: 175020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:17:57,477-Speed 3070.85 samples/sec   Loss 3.4492   LearningRate 0.0087   Epoch: 14   Global Step: 175030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:18:00,829-Speed 3055.57 samples/sec   Loss 3.3632   LearningRate 0.0087   Epoch: 14   Global Step: 175040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:18:04,236-Speed 3006.94 samples/sec   Loss 3.3293   LearningRate 0.0087   Epoch: 14   Global Step: 175050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:18:07,590-Speed 3053.12 samples/sec   Loss 3.4576   LearningRate 0.0087   Epoch: 14   Global Step: 175060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:18:10,899-Speed 3095.94 samples/sec   Loss 3.3423   LearningRate 0.0087   Epoch: 14   Global Step: 175070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:18:14,250-Speed 3057.34 samples/sec   Loss 3.4438   LearningRate 0.0087   Epoch: 14   Global Step: 175080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:18:17,683-Speed 2983.12 samples/sec   Loss 3.3301   LearningRate 0.0087   Epoch: 14   Global Step: 175090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:18:21,150-Speed 2954.37 samples/sec   Loss 3.3359   LearningRate 0.0087   Epoch: 14   Global Step: 175100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:18:24,613-Speed 2957.50 samples/sec   Loss 3.4128   LearningRate 0.0087   Epoch: 14   Global Step: 175110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:18:27,944-Speed 3075.24 samples/sec   Loss 3.3969   LearningRate 0.0087   Epoch: 14   Global Step: 175120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:18:31,318-Speed 3036.49 samples/sec   Loss 3.3794   LearningRate 0.0087   Epoch: 14   Global Step: 175130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:18:34,716-Speed 3013.76 samples/sec   Loss 3.3609   LearningRate 0.0087   Epoch: 14   Global Step: 175140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:18:38,051-Speed 3071.51 samples/sec   Loss 3.4602   LearningRate 0.0087   Epoch: 14   Global Step: 175150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:18:41,493-Speed 2976.09 samples/sec   Loss 3.3698   LearningRate 0.0087   Epoch: 14   Global Step: 175160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:18:44,916-Speed 2992.04 samples/sec   Loss 3.5015   LearningRate 0.0087   Epoch: 14   Global Step: 175170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:18:48,346-Speed 2985.94 samples/sec   Loss 3.3669   LearningRate 0.0087   Epoch: 14   Global Step: 175180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:18:51,765-Speed 2996.49 samples/sec   Loss 3.2695   LearningRate 0.0087   Epoch: 14   Global Step: 175190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:18:55,072-Speed 3096.67 samples/sec   Loss 3.4152   LearningRate 0.0087   Epoch: 14   Global Step: 175200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:18:58,445-Speed 3037.14 samples/sec   Loss 3.4191   LearningRate 0.0087   Epoch: 14   Global Step: 175210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:01,864-Speed 2995.89 samples/sec   Loss 3.4305   LearningRate 0.0087   Epoch: 14   Global Step: 175220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:05,280-Speed 2998.53 samples/sec   Loss 3.4408   LearningRate 0.0087   Epoch: 14   Global Step: 175230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:08,715-Speed 2981.55 samples/sec   Loss 3.3377   LearningRate 0.0087   Epoch: 14   Global Step: 175240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:12,186-Speed 2951.08 samples/sec   Loss 3.4920   LearningRate 0.0087   Epoch: 14   Global Step: 175250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:15,632-Speed 2972.28 samples/sec   Loss 3.3534   LearningRate 0.0087   Epoch: 14   Global Step: 175260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:19,126-Speed 2931.99 samples/sec   Loss 3.3780   LearningRate 0.0087   Epoch: 14   Global Step: 175270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:22,532-Speed 3006.85 samples/sec   Loss 3.4503   LearningRate 0.0087   Epoch: 14   Global Step: 175280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:25,931-Speed 3014.44 samples/sec   Loss 3.3319   LearningRate 0.0087   Epoch: 14   Global Step: 175290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:29,273-Speed 3064.78 samples/sec   Loss 3.3728   LearningRate 0.0087   Epoch: 14   Global Step: 175300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:32,609-Speed 3070.74 samples/sec   Loss 3.4190   LearningRate 0.0087   Epoch: 14   Global Step: 175310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:36,078-Speed 2951.86 samples/sec   Loss 3.3555   LearningRate 0.0087   Epoch: 14   Global Step: 175320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:39,444-Speed 3043.15 samples/sec   Loss 3.4203   LearningRate 0.0087   Epoch: 14   Global Step: 175330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:42,800-Speed 3052.53 samples/sec   Loss 3.4024   LearningRate 0.0087   Epoch: 14   Global Step: 175340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:46,173-Speed 3036.29 samples/sec   Loss 3.3867   LearningRate 0.0087   Epoch: 14   Global Step: 175350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:19:49,549-Speed 3034.08 samples/sec   Loss 3.3967   LearningRate 0.0086   Epoch: 14   Global Step: 175360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:19:53,031-Speed 2941.24 samples/sec   Loss 3.4540   LearningRate 0.0086   Epoch: 14   Global Step: 175370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:19:56,375-Speed 3063.68 samples/sec   Loss 3.4770   LearningRate 0.0086   Epoch: 14   Global Step: 175380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:19:59,722-Speed 3060.07 samples/sec   Loss 3.3797   LearningRate 0.0086   Epoch: 14   Global Step: 175390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:20:03,063-Speed 3065.08 samples/sec   Loss 3.3936   LearningRate 0.0086   Epoch: 14   Global Step: 175400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:20:06,412-Speed 3058.43 samples/sec   Loss 3.4292   LearningRate 0.0086   Epoch: 14   Global Step: 175410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:20:09,862-Speed 2969.59 samples/sec   Loss 3.4300   LearningRate 0.0086   Epoch: 14   Global Step: 175420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:20:13,206-Speed 3062.94 samples/sec   Loss 3.4099   LearningRate 0.0086   Epoch: 14   Global Step: 175430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:20:16,585-Speed 3031.22 samples/sec   Loss 3.4244   LearningRate 0.0086   Epoch: 14   Global Step: 175440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:20:19,928-Speed 3064.61 samples/sec   Loss 3.5758   LearningRate 0.0086   Epoch: 14   Global Step: 175450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:20:23,271-Speed 3063.75 samples/sec   Loss 3.4546   LearningRate 0.0086   Epoch: 14   Global Step: 175460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:20:26,678-Speed 3006.76 samples/sec   Loss 3.4124   LearningRate 0.0086   Epoch: 14   Global Step: 175470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:20:30,005-Speed 3079.05 samples/sec   Loss 3.4866   LearningRate 0.0086   Epoch: 14   Global Step: 175480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:20:33,399-Speed 3017.03 samples/sec   Loss 3.4745   LearningRate 0.0086   Epoch: 14   Global Step: 175490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:20:36,754-Speed 3053.25 samples/sec   Loss 3.3884   LearningRate 0.0086   Epoch: 14   Global Step: 175500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:20:40,107-Speed 3055.09 samples/sec   Loss 3.4917   LearningRate 0.0086   Epoch: 14   Global Step: 175510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:20:43,490-Speed 3027.84 samples/sec   Loss 3.4526   LearningRate 0.0086   Epoch: 14   Global Step: 175520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:20:46,833-Speed 3063.85 samples/sec   Loss 3.4587   LearningRate 0.0086   Epoch: 14   Global Step: 175530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:20:50,188-Speed 3053.13 samples/sec   Loss 3.4802   LearningRate 0.0086   Epoch: 14   Global Step: 175540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:20:53,517-Speed 3076.37 samples/sec   Loss 3.5169   LearningRate 0.0086   Epoch: 14   Global Step: 175550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:20:56,961-Speed 2973.98 samples/sec   Loss 3.4546   LearningRate 0.0086   Epoch: 14   Global Step: 175560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:21:00,357-Speed 3016.30 samples/sec   Loss 3.4900   LearningRate 0.0086   Epoch: 14   Global Step: 175570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:21:03,745-Speed 3023.00 samples/sec   Loss 3.4677   LearningRate 0.0086   Epoch: 14   Global Step: 175580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:21:07,181-Speed 2980.62 samples/sec   Loss 3.5289   LearningRate 0.0086   Epoch: 14   Global Step: 175590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:21:10,493-Speed 3092.95 samples/sec   Loss 3.4911   LearningRate 0.0086   Epoch: 14   Global Step: 175600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:21:13,807-Speed 3090.72 samples/sec   Loss 3.5114   LearningRate 0.0086   Epoch: 14   Global Step: 175610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:21:17,141-Speed 3072.80 samples/sec   Loss 3.4651   LearningRate 0.0086   Epoch: 14   Global Step: 175620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:21:20,509-Speed 3041.42 samples/sec   Loss 3.5288   LearningRate 0.0086   Epoch: 14   Global Step: 175630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:21:23,838-Speed 3077.03 samples/sec   Loss 3.5315   LearningRate 0.0086   Epoch: 14   Global Step: 175640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:21:27,214-Speed 3034.14 samples/sec   Loss 3.5652   LearningRate 0.0086   Epoch: 14   Global Step: 175650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:21:30,623-Speed 3004.55 samples/sec   Loss 3.5328   LearningRate 0.0086   Epoch: 14   Global Step: 175660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:21:34,078-Speed 2964.44 samples/sec   Loss 3.4641   LearningRate 0.0086   Epoch: 14   Global Step: 175670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:21:37,467-Speed 3023.21 samples/sec   Loss 3.5136   LearningRate 0.0086   Epoch: 14   Global Step: 175680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:21:40,858-Speed 3021.02 samples/sec   Loss 3.4452   LearningRate 0.0086   Epoch: 14   Global Step: 175690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:21:44,174-Speed 3089.19 samples/sec   Loss 3.4791   LearningRate 0.0086   Epoch: 14   Global Step: 175700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:21:47,533-Speed 3048.52 samples/sec   Loss 3.5113   LearningRate 0.0086   Epoch: 14   Global Step: 175710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:21:50,895-Speed 3047.28 samples/sec   Loss 3.4561   LearningRate 0.0086   Epoch: 14   Global Step: 175720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:21:54,228-Speed 3073.23 samples/sec   Loss 3.5174   LearningRate 0.0086   Epoch: 14   Global Step: 175730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:21:57,578-Speed 3058.60 samples/sec   Loss 3.4535   LearningRate 0.0086   Epoch: 14   Global Step: 175740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:22:01,035-Speed 2963.02 samples/sec   Loss 3.4602   LearningRate 0.0086   Epoch: 14   Global Step: 175750   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:22:04,464-Speed 2987.37 samples/sec   Loss 3.4630   LearningRate 0.0086   Epoch: 14   Global Step: 175760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:22:07,811-Speed 3060.41 samples/sec   Loss 3.5036   LearningRate 0.0086   Epoch: 14   Global Step: 175770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:22:11,189-Speed 3032.82 samples/sec   Loss 3.4915   LearningRate 0.0086   Epoch: 14   Global Step: 175780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:22:14,590-Speed 3011.85 samples/sec   Loss 3.4449   LearningRate 0.0085   Epoch: 14   Global Step: 175790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:22:17,999-Speed 3004.63 samples/sec   Loss 3.4450   LearningRate 0.0085   Epoch: 14   Global Step: 175800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:22:21,390-Speed 3021.32 samples/sec   Loss 3.6065   LearningRate 0.0085   Epoch: 14   Global Step: 175810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:22:24,814-Speed 2990.94 samples/sec   Loss 3.5145   LearningRate 0.0085   Epoch: 14   Global Step: 175820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:22:28,224-Speed 3003.91 samples/sec   Loss 3.5988   LearningRate 0.0085   Epoch: 14   Global Step: 175830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:22:31,590-Speed 3043.19 samples/sec   Loss 3.6094   LearningRate 0.0085   Epoch: 14   Global Step: 175840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:22:34,992-Speed 3011.19 samples/sec   Loss 3.4747   LearningRate 0.0085   Epoch: 14   Global Step: 175850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:22:38,404-Speed 3001.32 samples/sec   Loss 3.5525   LearningRate 0.0085   Epoch: 14   Global Step: 175860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:22:41,876-Speed 2950.88 samples/sec   Loss 3.4802   LearningRate 0.0085   Epoch: 14   Global Step: 175870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:22:45,206-Speed 3075.60 samples/sec   Loss 3.4677   LearningRate 0.0085   Epoch: 14   Global Step: 175880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:22:48,542-Speed 3070.77 samples/sec   Loss 3.5273   LearningRate 0.0085   Epoch: 14   Global Step: 175890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:22:51,864-Speed 3083.43 samples/sec   Loss 3.5573   LearningRate 0.0085   Epoch: 14   Global Step: 175900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:22:55,202-Speed 3068.93 samples/sec   Loss 3.6100   LearningRate 0.0085   Epoch: 14   Global Step: 175910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:22:58,603-Speed 3011.98 samples/sec   Loss 3.4739   LearningRate 0.0085   Epoch: 14   Global Step: 175920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:01,938-Speed 3072.92 samples/sec   Loss 3.4653   LearningRate 0.0085   Epoch: 14   Global Step: 175930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:05,331-Speed 3018.93 samples/sec   Loss 3.4825   LearningRate 0.0085   Epoch: 14   Global Step: 175940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:08,706-Speed 3035.39 samples/sec   Loss 3.6235   LearningRate 0.0085   Epoch: 14   Global Step: 175950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:12,097-Speed 3020.13 samples/sec   Loss 3.5202   LearningRate 0.0085   Epoch: 14   Global Step: 175960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:15,451-Speed 3054.60 samples/sec   Loss 3.5749   LearningRate 0.0085   Epoch: 14   Global Step: 175970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:18,853-Speed 3010.33 samples/sec   Loss 3.4934   LearningRate 0.0085   Epoch: 14   Global Step: 175980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:22,162-Speed 3095.41 samples/sec   Loss 3.6107   LearningRate 0.0085   Epoch: 14   Global Step: 175990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:23:25,530-Speed 3041.86 samples/sec   Loss 3.5462   LearningRate 0.0085   Epoch: 14   Global Step: 176000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:28,922-Speed 3020.66 samples/sec   Loss 3.5550   LearningRate 0.0085   Epoch: 14   Global Step: 176010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:32,355-Speed 2983.39 samples/sec   Loss 3.5385   LearningRate 0.0085   Epoch: 14   Global Step: 176020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:35,727-Speed 3037.70 samples/sec   Loss 3.5114   LearningRate 0.0085   Epoch: 14   Global Step: 176030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:39,118-Speed 3020.42 samples/sec   Loss 3.5802   LearningRate 0.0085   Epoch: 14   Global Step: 176040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:42,544-Speed 2990.93 samples/sec   Loss 3.5743   LearningRate 0.0085   Epoch: 14   Global Step: 176050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:45,991-Speed 2970.88 samples/sec   Loss 3.6394   LearningRate 0.0085   Epoch: 14   Global Step: 176060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:49,412-Speed 2994.32 samples/sec   Loss 3.5883   LearningRate 0.0085   Epoch: 14   Global Step: 176070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:52,822-Speed 3004.14 samples/sec   Loss 3.6187   LearningRate 0.0085   Epoch: 14   Global Step: 176080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:56,199-Speed 3032.96 samples/sec   Loss 3.5501   LearningRate 0.0085   Epoch: 14   Global Step: 176090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:23:59,607-Speed 3005.00 samples/sec   Loss 3.5673   LearningRate 0.0085   Epoch: 14   Global Step: 176100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:24:03,058-Speed 2968.84 samples/sec   Loss 3.5107   LearningRate 0.0085   Epoch: 14   Global Step: 176110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:24:06,462-Speed 3008.51 samples/sec   Loss 3.7465   LearningRate 0.0085   Epoch: 14   Global Step: 176120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:24:09,871-Speed 3004.88 samples/sec   Loss 3.6149   LearningRate 0.0085   Epoch: 14   Global Step: 176130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:24:13,239-Speed 3041.83 samples/sec   Loss 3.5107   LearningRate 0.0085   Epoch: 14   Global Step: 176140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:24:16,669-Speed 2985.91 samples/sec   Loss 3.4903   LearningRate 0.0085   Epoch: 14   Global Step: 176150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:24:20,057-Speed 3023.54 samples/sec   Loss 3.5364   LearningRate 0.0085   Epoch: 14   Global Step: 176160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:24:23,502-Speed 2973.44 samples/sec   Loss 3.4903   LearningRate 0.0085   Epoch: 14   Global Step: 176170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:24:26,821-Speed 3085.90 samples/sec   Loss 3.4422   LearningRate 0.0085   Epoch: 14   Global Step: 176180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:24:30,187-Speed 3042.75 samples/sec   Loss 3.5469   LearningRate 0.0085   Epoch: 14   Global Step: 176190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:24:33,502-Speed 3090.47 samples/sec   Loss 3.6166   LearningRate 0.0085   Epoch: 14   Global Step: 176200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:24:36,910-Speed 3005.38 samples/sec   Loss 3.5497   LearningRate 0.0084   Epoch: 14   Global Step: 176210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:24:40,271-Speed 3047.58 samples/sec   Loss 3.5527   LearningRate 0.0084   Epoch: 14   Global Step: 176220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:24:43,627-Speed 3052.32 samples/sec   Loss 3.5942   LearningRate 0.0084   Epoch: 14   Global Step: 176230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:24:47,083-Speed 2963.20 samples/sec   Loss 3.6274   LearningRate 0.0084   Epoch: 14   Global Step: 176240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:24:50,466-Speed 3029.21 samples/sec   Loss 3.6332   LearningRate 0.0084   Epoch: 14   Global Step: 176250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:24:53,871-Speed 3007.40 samples/sec   Loss 3.5383   LearningRate 0.0084   Epoch: 14   Global Step: 176260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:24:57,339-Speed 2953.34 samples/sec   Loss 3.6332   LearningRate 0.0084   Epoch: 14   Global Step: 176270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:25:00,817-Speed 2945.78 samples/sec   Loss 3.5342   LearningRate 0.0084   Epoch: 14   Global Step: 176280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:25:04,216-Speed 3013.24 samples/sec   Loss 3.5210   LearningRate 0.0084   Epoch: 14   Global Step: 176290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:25:07,577-Speed 3047.13 samples/sec   Loss 3.5813   LearningRate 0.0084   Epoch: 14   Global Step: 176300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:25:11,055-Speed 2945.45 samples/sec   Loss 3.5482   LearningRate 0.0084   Epoch: 14   Global Step: 176310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:25:14,356-Speed 3102.49 samples/sec   Loss 3.5225   LearningRate 0.0084   Epoch: 14   Global Step: 176320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:25:17,846-Speed 2934.41 samples/sec   Loss 3.5698   LearningRate 0.0084   Epoch: 14   Global Step: 176330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:25:21,244-Speed 3015.01 samples/sec   Loss 3.7278   LearningRate 0.0084   Epoch: 14   Global Step: 176340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:25:24,719-Speed 2947.57 samples/sec   Loss 3.5624   LearningRate 0.0084   Epoch: 14   Global Step: 176350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:25:28,117-Speed 3013.92 samples/sec   Loss 3.4949   LearningRate 0.0084   Epoch: 14   Global Step: 176360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:25:31,567-Speed 2969.51 samples/sec   Loss 3.5876   LearningRate 0.0084   Epoch: 14   Global Step: 176370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:25:35,012-Speed 2973.05 samples/sec   Loss 3.6556   LearningRate 0.0084   Epoch: 14   Global Step: 176380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:25:38,515-Speed 2924.83 samples/sec   Loss 3.6876   LearningRate 0.0084   Epoch: 14   Global Step: 176390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:25:41,897-Speed 3027.87 samples/sec   Loss 3.5154   LearningRate 0.0084   Epoch: 14   Global Step: 176400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:25:45,313-Speed 2999.62 samples/sec   Loss 3.5609   LearningRate 0.0084   Epoch: 14   Global Step: 176410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:25:48,768-Speed 2964.44 samples/sec   Loss 3.6122   LearningRate 0.0084   Epoch: 14   Global Step: 176420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:25:52,159-Speed 3020.41 samples/sec   Loss 3.6018   LearningRate 0.0084   Epoch: 14   Global Step: 176430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:25:55,623-Speed 2956.99 samples/sec   Loss 3.7115   LearningRate 0.0084   Epoch: 14   Global Step: 176440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:25:58,979-Speed 3052.59 samples/sec   Loss 3.6156   LearningRate 0.0084   Epoch: 14   Global Step: 176450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:26:02,398-Speed 2996.10 samples/sec   Loss 3.5820   LearningRate 0.0084   Epoch: 14   Global Step: 176460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:26:05,821-Speed 2991.68 samples/sec   Loss 3.6069   LearningRate 0.0084   Epoch: 14   Global Step: 176470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:26:09,336-Speed 2914.82 samples/sec   Loss 3.5962   LearningRate 0.0084   Epoch: 14   Global Step: 176480   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:26:12,805-Speed 2952.77 samples/sec   Loss 3.5549   LearningRate 0.0084   Epoch: 14   Global Step: 176490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:26:16,161-Speed 3051.25 samples/sec   Loss 3.5555   LearningRate 0.0084   Epoch: 14   Global Step: 176500   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:26:19,561-Speed 3013.15 samples/sec   Loss 3.6452   LearningRate 0.0084   Epoch: 14   Global Step: 176510   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:26:22,966-Speed 3008.27 samples/sec   Loss 3.5695   LearningRate 0.0084   Epoch: 14   Global Step: 176520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:26:26,379-Speed 3000.95 samples/sec   Loss 3.4619   LearningRate 0.0084   Epoch: 14   Global Step: 176530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:26:29,766-Speed 3023.96 samples/sec   Loss 3.6038   LearningRate 0.0084   Epoch: 14   Global Step: 176540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:26:33,240-Speed 2948.84 samples/sec   Loss 3.7119   LearningRate 0.0084   Epoch: 14   Global Step: 176550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:26:36,672-Speed 2984.19 samples/sec   Loss 3.6200   LearningRate 0.0084   Epoch: 14   Global Step: 176560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:26:40,048-Speed 3035.20 samples/sec   Loss 3.5344   LearningRate 0.0084   Epoch: 14   Global Step: 176570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:26:43,432-Speed 3026.74 samples/sec   Loss 3.5345   LearningRate 0.0084   Epoch: 14   Global Step: 176580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:26:46,794-Speed 3045.89 samples/sec   Loss 3.5501   LearningRate 0.0084   Epoch: 14   Global Step: 176590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:26:50,178-Speed 3027.54 samples/sec   Loss 3.5396   LearningRate 0.0084   Epoch: 14   Global Step: 176600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:26:53,572-Speed 3017.75 samples/sec   Loss 3.6405   LearningRate 0.0084   Epoch: 14   Global Step: 176610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:26:57,005-Speed 2982.96 samples/sec   Loss 3.5942   LearningRate 0.0084   Epoch: 14   Global Step: 176620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:00,356-Speed 3057.68 samples/sec   Loss 3.6074   LearningRate 0.0084   Epoch: 14   Global Step: 176630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:03,722-Speed 3042.67 samples/sec   Loss 3.6056   LearningRate 0.0083   Epoch: 14   Global Step: 176640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:07,066-Speed 3062.99 samples/sec   Loss 3.6283   LearningRate 0.0083   Epoch: 14   Global Step: 176650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:10,522-Speed 2963.67 samples/sec   Loss 3.5515   LearningRate 0.0083   Epoch: 14   Global Step: 176660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:13,933-Speed 3002.68 samples/sec   Loss 3.6550   LearningRate 0.0083   Epoch: 14   Global Step: 176670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:17,345-Speed 3002.06 samples/sec   Loss 3.6340   LearningRate 0.0083   Epoch: 14   Global Step: 176680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:20,824-Speed 2944.99 samples/sec   Loss 3.5566   LearningRate 0.0083   Epoch: 14   Global Step: 176690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:24,241-Speed 2997.57 samples/sec   Loss 3.6893   LearningRate 0.0083   Epoch: 14   Global Step: 176700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:27,579-Speed 3067.81 samples/sec   Loss 3.5666   LearningRate 0.0083   Epoch: 14   Global Step: 176710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:30,953-Speed 3036.82 samples/sec   Loss 3.6034   LearningRate 0.0083   Epoch: 14   Global Step: 176720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:34,420-Speed 2953.55 samples/sec   Loss 3.6403   LearningRate 0.0083   Epoch: 14   Global Step: 176730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:37,803-Speed 3028.56 samples/sec   Loss 3.5965   LearningRate 0.0083   Epoch: 14   Global Step: 176740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:41,207-Speed 3009.19 samples/sec   Loss 3.5697   LearningRate 0.0083   Epoch: 14   Global Step: 176750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:27:44,554-Speed 3059.84 samples/sec   Loss 3.6154   LearningRate 0.0083   Epoch: 14   Global Step: 176760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:47,942-Speed 3023.55 samples/sec   Loss 3.6046   LearningRate 0.0083   Epoch: 14   Global Step: 176770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:51,341-Speed 3013.75 samples/sec   Loss 3.6039   LearningRate 0.0083   Epoch: 14   Global Step: 176780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:54,670-Speed 3076.74 samples/sec   Loss 3.6173   LearningRate 0.0083   Epoch: 14   Global Step: 176790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:27:58,032-Speed 3046.41 samples/sec   Loss 3.6130   LearningRate 0.0083   Epoch: 14   Global Step: 176800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:01,422-Speed 3021.94 samples/sec   Loss 3.5364   LearningRate 0.0083   Epoch: 14   Global Step: 176810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:05,678-Speed 2406.50 samples/sec   Loss 3.6073   LearningRate 0.0083   Epoch: 14   Global Step: 176820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:09,083-Speed 3008.84 samples/sec   Loss 3.5687   LearningRate 0.0083   Epoch: 14   Global Step: 176830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:12,491-Speed 3004.87 samples/sec   Loss 3.6128   LearningRate 0.0083   Epoch: 14   Global Step: 176840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:15,879-Speed 3023.61 samples/sec   Loss 3.6832   LearningRate 0.0083   Epoch: 14   Global Step: 176850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:19,275-Speed 3016.48 samples/sec   Loss 3.7031   LearningRate 0.0083   Epoch: 14   Global Step: 176860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:28:22,629-Speed 3053.56 samples/sec   Loss 3.6902   LearningRate 0.0083   Epoch: 14   Global Step: 176870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:26,051-Speed 2992.73 samples/sec   Loss 3.7016   LearningRate 0.0083   Epoch: 14   Global Step: 176880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:29,471-Speed 2995.01 samples/sec   Loss 3.6060   LearningRate 0.0083   Epoch: 14   Global Step: 176890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:32,917-Speed 2972.70 samples/sec   Loss 3.6416   LearningRate 0.0083   Epoch: 14   Global Step: 176900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:36,374-Speed 2962.95 samples/sec   Loss 3.6707   LearningRate 0.0083   Epoch: 14   Global Step: 176910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:39,801-Speed 2988.78 samples/sec   Loss 3.7027   LearningRate 0.0083   Epoch: 14   Global Step: 176920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:43,241-Speed 2977.54 samples/sec   Loss 3.7354   LearningRate 0.0083   Epoch: 14   Global Step: 176930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:46,646-Speed 3007.65 samples/sec   Loss 3.6631   LearningRate 0.0083   Epoch: 14   Global Step: 176940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:50,074-Speed 2988.37 samples/sec   Loss 3.6263   LearningRate 0.0083   Epoch: 14   Global Step: 176950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:53,437-Speed 3045.64 samples/sec   Loss 3.6835   LearningRate 0.0083   Epoch: 14   Global Step: 176960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:28:56,873-Speed 2981.13 samples/sec   Loss 3.6887   LearningRate 0.0083   Epoch: 14   Global Step: 176970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:29:00,274-Speed 3011.63 samples/sec   Loss 3.6794   LearningRate 0.0083   Epoch: 14   Global Step: 176980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:29:03,695-Speed 2994.32 samples/sec   Loss 3.7325   LearningRate 0.0083   Epoch: 14   Global Step: 176990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:29:07,090-Speed 3017.02 samples/sec   Loss 3.6954   LearningRate 0.0083   Epoch: 14   Global Step: 177000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:29:10,555-Speed 2956.89 samples/sec   Loss 3.7417   LearningRate 0.0083   Epoch: 14   Global Step: 177010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:29:13,980-Speed 2990.62 samples/sec   Loss 3.6530   LearningRate 0.0083   Epoch: 14   Global Step: 177020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:29:17,442-Speed 2958.46 samples/sec   Loss 3.6263   LearningRate 0.0083   Epoch: 14   Global Step: 177030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:29:20,842-Speed 3012.30 samples/sec   Loss 3.6992   LearningRate 0.0083   Epoch: 14   Global Step: 177040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:29:24,243-Speed 3012.22 samples/sec   Loss 3.6828   LearningRate 0.0083   Epoch: 14   Global Step: 177050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:29:27,609-Speed 3042.97 samples/sec   Loss 3.6821   LearningRate 0.0083   Epoch: 14   Global Step: 177060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:29:30,990-Speed 3029.39 samples/sec   Loss 3.6721   LearningRate 0.0082   Epoch: 14   Global Step: 177070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:29:34,479-Speed 2936.37 samples/sec   Loss 3.6841   LearningRate 0.0082   Epoch: 14   Global Step: 177080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:29:37,956-Speed 2946.03 samples/sec   Loss 3.6752   LearningRate 0.0082   Epoch: 14   Global Step: 177090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:29:41,368-Speed 3002.47 samples/sec   Loss 3.5961   LearningRate 0.0082   Epoch: 14   Global Step: 177100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:29:44,800-Speed 2984.36 samples/sec   Loss 3.7190   LearningRate 0.0082   Epoch: 14   Global Step: 177110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:29:48,141-Speed 3065.88 samples/sec   Loss 3.6705   LearningRate 0.0082   Epoch: 14   Global Step: 177120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:29:51,594-Speed 2966.45 samples/sec   Loss 3.6494   LearningRate 0.0082   Epoch: 14   Global Step: 177130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:29:55,055-Speed 2958.95 samples/sec   Loss 3.6538   LearningRate 0.0082   Epoch: 14   Global Step: 177140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:29:58,406-Speed 3056.55 samples/sec   Loss 3.6413   LearningRate 0.0082   Epoch: 14   Global Step: 177150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:30:01,778-Speed 3038.53 samples/sec   Loss 3.6896   LearningRate 0.0082   Epoch: 14   Global Step: 177160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:30:05,110-Speed 3073.82 samples/sec   Loss 3.6630   LearningRate 0.0082   Epoch: 14   Global Step: 177170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:30:08,513-Speed 3010.12 samples/sec   Loss 3.7917   LearningRate 0.0082   Epoch: 14   Global Step: 177180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:30:11,979-Speed 2955.36 samples/sec   Loss 3.6743   LearningRate 0.0082   Epoch: 14   Global Step: 177190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:30:15,358-Speed 3031.68 samples/sec   Loss 3.7047   LearningRate 0.0082   Epoch: 14   Global Step: 177200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:30:18,726-Speed 3041.13 samples/sec   Loss 3.6392   LearningRate 0.0082   Epoch: 14   Global Step: 177210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:30:22,110-Speed 3026.63 samples/sec   Loss 3.6874   LearningRate 0.0082   Epoch: 14   Global Step: 177220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:30:25,410-Speed 3103.30 samples/sec   Loss 3.6823   LearningRate 0.0082   Epoch: 14   Global Step: 177230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:30:28,809-Speed 3014.08 samples/sec   Loss 3.5995   LearningRate 0.0082   Epoch: 14   Global Step: 177240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:30:32,165-Speed 3051.66 samples/sec   Loss 3.6163   LearningRate 0.0082   Epoch: 14   Global Step: 177250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:30:35,522-Speed 3051.41 samples/sec   Loss 3.6916   LearningRate 0.0082   Epoch: 14   Global Step: 177260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:30:38,841-Speed 3086.26 samples/sec   Loss 3.5916   LearningRate 0.0082   Epoch: 14   Global Step: 177270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:30:42,225-Speed 3027.01 samples/sec   Loss 3.6325   LearningRate 0.0082   Epoch: 14   Global Step: 177280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:30:45,557-Speed 3073.36 samples/sec   Loss 3.6275   LearningRate 0.0082   Epoch: 14   Global Step: 177290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:30:48,927-Speed 3040.16 samples/sec   Loss 3.5975   LearningRate 0.0082   Epoch: 14   Global Step: 177300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:30:52,357-Speed 2986.16 samples/sec   Loss 3.6983   LearningRate 0.0082   Epoch: 14   Global Step: 177310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:30:55,728-Speed 3038.56 samples/sec   Loss 3.7547   LearningRate 0.0082   Epoch: 14   Global Step: 177320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:30:59,097-Speed 3040.33 samples/sec   Loss 3.6282   LearningRate 0.0082   Epoch: 14   Global Step: 177330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:31:02,454-Speed 3052.21 samples/sec   Loss 3.6922   LearningRate 0.0082   Epoch: 14   Global Step: 177340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:31:05,796-Speed 3064.34 samples/sec   Loss 3.7283   LearningRate 0.0082   Epoch: 14   Global Step: 177350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:31:09,133-Speed 3070.11 samples/sec   Loss 3.6130   LearningRate 0.0082   Epoch: 14   Global Step: 177360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:31:12,480-Speed 3060.06 samples/sec   Loss 3.7715   LearningRate 0.0082   Epoch: 14   Global Step: 177370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:31:15,908-Speed 2987.76 samples/sec   Loss 3.7692   LearningRate 0.0082   Epoch: 14   Global Step: 177380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:31:19,265-Speed 3051.76 samples/sec   Loss 3.6984   LearningRate 0.0082   Epoch: 14   Global Step: 177390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:31:22,735-Speed 2951.37 samples/sec   Loss 3.6467   LearningRate 0.0082   Epoch: 14   Global Step: 177400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:31:26,115-Speed 3030.95 samples/sec   Loss 3.6601   LearningRate 0.0082   Epoch: 14   Global Step: 177410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:31:29,567-Speed 2967.04 samples/sec   Loss 3.6298   LearningRate 0.0082   Epoch: 14   Global Step: 177420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:31:32,977-Speed 3003.24 samples/sec   Loss 3.6680   LearningRate 0.0082   Epoch: 14   Global Step: 177430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:31:36,292-Speed 3090.60 samples/sec   Loss 3.6917   LearningRate 0.0082   Epoch: 14   Global Step: 177440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:31:39,701-Speed 3004.70 samples/sec   Loss 3.7219   LearningRate 0.0082   Epoch: 14   Global Step: 177450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:31:43,067-Speed 3042.73 samples/sec   Loss 3.6419   LearningRate 0.0082   Epoch: 14   Global Step: 177460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:31:46,415-Speed 3059.15 samples/sec   Loss 3.7133   LearningRate 0.0082   Epoch: 14   Global Step: 177470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:31:49,734-Speed 3087.04 samples/sec   Loss 3.7156   LearningRate 0.0082   Epoch: 14   Global Step: 177480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:31:53,115-Speed 3029.42 samples/sec   Loss 3.7707   LearningRate 0.0082   Epoch: 14   Global Step: 177490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:31:56,490-Speed 3034.43 samples/sec   Loss 3.7001   LearningRate 0.0082   Epoch: 14   Global Step: 177500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:31:59,942-Speed 2967.74 samples/sec   Loss 3.6441   LearningRate 0.0081   Epoch: 14   Global Step: 177510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:32:03,368-Speed 2989.64 samples/sec   Loss 3.6669   LearningRate 0.0081   Epoch: 14   Global Step: 177520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:32:06,803-Speed 2981.71 samples/sec   Loss 3.7166   LearningRate 0.0081   Epoch: 14   Global Step: 177530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:32:10,235-Speed 2984.99 samples/sec   Loss 3.7398   LearningRate 0.0081   Epoch: 14   Global Step: 177540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:32:13,583-Speed 3059.36 samples/sec   Loss 3.7553   LearningRate 0.0081   Epoch: 14   Global Step: 177550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:32:17,076-Speed 2932.21 samples/sec   Loss 3.6915   LearningRate 0.0081   Epoch: 14   Global Step: 177560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:32:20,502-Speed 2989.64 samples/sec   Loss 3.7256   LearningRate 0.0081   Epoch: 14   Global Step: 177570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:32:23,926-Speed 2991.84 samples/sec   Loss 3.7016   LearningRate 0.0081   Epoch: 14   Global Step: 177580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:32:27,374-Speed 2970.27 samples/sec   Loss 3.6634   LearningRate 0.0081   Epoch: 14   Global Step: 177590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:32:30,690-Speed 3089.44 samples/sec   Loss 3.6605   LearningRate 0.0081   Epoch: 14   Global Step: 177600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:32:34,121-Speed 2985.25 samples/sec   Loss 3.7065   LearningRate 0.0081   Epoch: 14   Global Step: 177610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:32:37,520-Speed 3013.45 samples/sec   Loss 3.6765   LearningRate 0.0081   Epoch: 14   Global Step: 177620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:32:40,895-Speed 3034.96 samples/sec   Loss 3.6291   LearningRate 0.0081   Epoch: 14   Global Step: 177630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:32:44,219-Speed 3081.25 samples/sec   Loss 3.6802   LearningRate 0.0081   Epoch: 14   Global Step: 177640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-27 18:32:47,656-Speed 2980.67 samples/sec   Loss 3.7660   LearningRate 0.0081   Epoch: 14   Global Step: 177650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:32:51,040-Speed 3026.43 samples/sec   Loss 3.6830   LearningRate 0.0081   Epoch: 14   Global Step: 177660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:32:54,514-Speed 2948.30 samples/sec   Loss 3.7157   LearningRate 0.0081   Epoch: 14   Global Step: 177670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:32:57,930-Speed 2998.25 samples/sec   Loss 3.6448   LearningRate 0.0081   Epoch: 14   Global Step: 177680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:33:01,341-Speed 3003.62 samples/sec   Loss 3.6763   LearningRate 0.0081   Epoch: 14   Global Step: 177690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:33:04,703-Speed 3045.79 samples/sec   Loss 3.6251   LearningRate 0.0081   Epoch: 14   Global Step: 177700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:33:08,118-Speed 2999.11 samples/sec   Loss 3.7194   LearningRate 0.0081   Epoch: 14   Global Step: 177710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:33:11,498-Speed 3030.96 samples/sec   Loss 3.6425   LearningRate 0.0081   Epoch: 14   Global Step: 177720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:33:14,893-Speed 3016.60 samples/sec   Loss 3.7405   LearningRate 0.0081   Epoch: 14   Global Step: 177730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:33:18,263-Speed 3040.17 samples/sec   Loss 3.6910   LearningRate 0.0081   Epoch: 14   Global Step: 177740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:33:21,675-Speed 3002.03 samples/sec   Loss 3.7298   LearningRate 0.0081   Epoch: 14   Global Step: 177750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:33:25,118-Speed 2974.78 samples/sec   Loss 3.6828   LearningRate 0.0081   Epoch: 14   Global Step: 177760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:33:28,453-Speed 3072.09 samples/sec   Loss 3.7245   LearningRate 0.0081   Epoch: 14   Global Step: 177770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:33:31,833-Speed 3029.71 samples/sec   Loss 3.6248   LearningRate 0.0081   Epoch: 14   Global Step: 177780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:33:35,247-Speed 3000.56 samples/sec   Loss 3.6898   LearningRate 0.0081   Epoch: 14   Global Step: 177790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:33:38,698-Speed 2968.62 samples/sec   Loss 3.7166   LearningRate 0.0081   Epoch: 14   Global Step: 177800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:33:42,099-Speed 3011.23 samples/sec   Loss 3.6853   LearningRate 0.0081   Epoch: 14   Global Step: 177810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:33:45,418-Speed 3085.72 samples/sec   Loss 3.7173   LearningRate 0.0081   Epoch: 14   Global Step: 177820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:33:48,857-Speed 2979.16 samples/sec   Loss 3.7442   LearningRate 0.0081   Epoch: 14   Global Step: 177830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:33:52,187-Speed 3075.30 samples/sec   Loss 3.6897   LearningRate 0.0081   Epoch: 14   Global Step: 177840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:33:55,581-Speed 3017.45 samples/sec   Loss 3.7018   LearningRate 0.0081   Epoch: 14   Global Step: 177850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:33:59,021-Speed 2977.84 samples/sec   Loss 3.7670   LearningRate 0.0081   Epoch: 14   Global Step: 177860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:34:02,378-Speed 3051.53 samples/sec   Loss 3.7458   LearningRate 0.0081   Epoch: 14   Global Step: 177870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:34:05,786-Speed 3005.54 samples/sec   Loss 3.7706   LearningRate 0.0081   Epoch: 14   Global Step: 177880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:34:09,125-Speed 3067.02 samples/sec   Loss 3.8023   LearningRate 0.0081   Epoch: 14   Global Step: 177890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:34:12,486-Speed 3047.46 samples/sec   Loss 3.7609   LearningRate 0.0081   Epoch: 14   Global Step: 177900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:34:15,825-Speed 3067.82 samples/sec   Loss 3.7343   LearningRate 0.0081   Epoch: 14   Global Step: 177910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:34:19,180-Speed 3053.45 samples/sec   Loss 3.6785   LearningRate 0.0081   Epoch: 14   Global Step: 177920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:34:22,522-Speed 3064.68 samples/sec   Loss 3.7775   LearningRate 0.0081   Epoch: 14   Global Step: 177930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:34:25,863-Speed 3065.85 samples/sec   Loss 3.7761   LearningRate 0.0080   Epoch: 14   Global Step: 177940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:34:29,300-Speed 2980.10 samples/sec   Loss 3.8061   LearningRate 0.0080   Epoch: 14   Global Step: 177950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:34:32,614-Speed 3091.56 samples/sec   Loss 3.7081   LearningRate 0.0080   Epoch: 14   Global Step: 177960   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:34:35,977-Speed 3045.73 samples/sec   Loss 3.7311   LearningRate 0.0080   Epoch: 14   Global Step: 177970   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:34:39,375-Speed 3013.78 samples/sec   Loss 3.7214   LearningRate 0.0080   Epoch: 14   Global Step: 177980   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:34:42,769-Speed 3017.63 samples/sec   Loss 3.6840   LearningRate 0.0080   Epoch: 14   Global Step: 177990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:34:46,216-Speed 2971.97 samples/sec   Loss 3.6014   LearningRate 0.0080   Epoch: 14   Global Step: 178000   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:34:49,728-Speed 2916.60 samples/sec   Loss 3.7494   LearningRate 0.0080   Epoch: 14   Global Step: 178010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:34:53,078-Speed 3056.97 samples/sec   Loss 3.7437   LearningRate 0.0080   Epoch: 14   Global Step: 178020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:34:56,484-Speed 3006.78 samples/sec   Loss 3.8258   LearningRate 0.0080   Epoch: 14   Global Step: 178030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:34:59,900-Speed 2999.21 samples/sec   Loss 3.7208   LearningRate 0.0080   Epoch: 14   Global Step: 178040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:35:03,334-Speed 2982.62 samples/sec   Loss 3.7114   LearningRate 0.0080   Epoch: 14   Global Step: 178050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:35:06,755-Speed 2994.00 samples/sec   Loss 3.7670   LearningRate 0.0080   Epoch: 14   Global Step: 178060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:35:10,174-Speed 2995.67 samples/sec   Loss 3.7690   LearningRate 0.0080   Epoch: 14   Global Step: 178070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:35:13,567-Speed 3019.16 samples/sec   Loss 3.6843   LearningRate 0.0080   Epoch: 14   Global Step: 178080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:35:17,057-Speed 2934.70 samples/sec   Loss 3.7213   LearningRate 0.0080   Epoch: 14   Global Step: 178090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:35:20,470-Speed 3001.25 samples/sec   Loss 3.6534   LearningRate 0.0080   Epoch: 14   Global Step: 178100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:35:23,836-Speed 3043.44 samples/sec   Loss 3.7578   LearningRate 0.0080   Epoch: 14   Global Step: 178110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:35:27,207-Speed 3038.01 samples/sec   Loss 3.7171   LearningRate 0.0080   Epoch: 14   Global Step: 178120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:35:30,560-Speed 3054.81 samples/sec   Loss 3.6576   LearningRate 0.0080   Epoch: 14   Global Step: 178130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:35:33,978-Speed 2996.69 samples/sec   Loss 3.7475   LearningRate 0.0080   Epoch: 14   Global Step: 178140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:35:37,386-Speed 3006.08 samples/sec   Loss 3.7406   LearningRate 0.0080   Epoch: 14   Global Step: 178150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:35:40,811-Speed 2989.95 samples/sec   Loss 3.7847   LearningRate 0.0080   Epoch: 14   Global Step: 178160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:35:44,225-Speed 3000.37 samples/sec   Loss 3.7123   LearningRate 0.0080   Epoch: 14   Global Step: 178170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:35:47,594-Speed 3040.50 samples/sec   Loss 3.7696   LearningRate 0.0080   Epoch: 14   Global Step: 178180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:35:50,945-Speed 3057.60 samples/sec   Loss 3.7157   LearningRate 0.0080   Epoch: 14   Global Step: 178190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:35:54,268-Speed 3081.65 samples/sec   Loss 3.7368   LearningRate 0.0080   Epoch: 14   Global Step: 178200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:35:57,724-Speed 2964.36 samples/sec   Loss 3.7054   LearningRate 0.0080   Epoch: 14   Global Step: 178210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:01,027-Speed 3100.63 samples/sec   Loss 3.6492   LearningRate 0.0080   Epoch: 14   Global Step: 178220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:04,401-Speed 3035.68 samples/sec   Loss 3.8512   LearningRate 0.0080   Epoch: 14   Global Step: 178230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:07,813-Speed 3002.26 samples/sec   Loss 3.7780   LearningRate 0.0080   Epoch: 14   Global Step: 178240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:11,160-Speed 3059.90 samples/sec   Loss 3.8280   LearningRate 0.0080   Epoch: 14   Global Step: 178250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:14,503-Speed 3064.29 samples/sec   Loss 3.7041   LearningRate 0.0080   Epoch: 14   Global Step: 178260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:17,876-Speed 3037.22 samples/sec   Loss 3.7753   LearningRate 0.0080   Epoch: 14   Global Step: 178270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:36:21,268-Speed 3019.77 samples/sec   Loss 3.7123   LearningRate 0.0080   Epoch: 14   Global Step: 178280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:24,734-Speed 2954.65 samples/sec   Loss 3.7579   LearningRate 0.0080   Epoch: 14   Global Step: 178290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:28,181-Speed 2971.96 samples/sec   Loss 3.6747   LearningRate 0.0080   Epoch: 14   Global Step: 178300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:31,551-Speed 3039.32 samples/sec   Loss 3.7567   LearningRate 0.0080   Epoch: 14   Global Step: 178310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:34,971-Speed 2994.66 samples/sec   Loss 3.7644   LearningRate 0.0080   Epoch: 14   Global Step: 178320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:38,326-Speed 3052.54 samples/sec   Loss 3.7312   LearningRate 0.0080   Epoch: 14   Global Step: 178330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:41,746-Speed 2995.52 samples/sec   Loss 3.7185   LearningRate 0.0080   Epoch: 14   Global Step: 178340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:45,138-Speed 3019.70 samples/sec   Loss 3.7420   LearningRate 0.0080   Epoch: 14   Global Step: 178350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:48,535-Speed 3015.45 samples/sec   Loss 3.7708   LearningRate 0.0080   Epoch: 14   Global Step: 178360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:51,898-Speed 3044.98 samples/sec   Loss 3.7562   LearningRate 0.0080   Epoch: 14   Global Step: 178370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:36:55,207-Speed 3096.61 samples/sec   Loss 3.7712   LearningRate 0.0079   Epoch: 14   Global Step: 178380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:36:58,521-Speed 3090.81 samples/sec   Loss 3.7681   LearningRate 0.0079   Epoch: 14   Global Step: 178390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:37:01,846-Speed 3080.25 samples/sec   Loss 3.7896   LearningRate 0.0079   Epoch: 14   Global Step: 178400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:37:05,277-Speed 2985.44 samples/sec   Loss 3.6920   LearningRate 0.0079   Epoch: 14   Global Step: 178410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:37:08,620-Speed 3064.17 samples/sec   Loss 3.8231   LearningRate 0.0079   Epoch: 14   Global Step: 178420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:37:12,007-Speed 3024.73 samples/sec   Loss 3.7543   LearningRate 0.0079   Epoch: 14   Global Step: 178430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:37:15,433-Speed 2989.33 samples/sec   Loss 3.8285   LearningRate 0.0079   Epoch: 14   Global Step: 178440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:37:18,761-Speed 3077.20 samples/sec   Loss 3.7782   LearningRate 0.0079   Epoch: 14   Global Step: 178450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:37:22,226-Speed 2956.89 samples/sec   Loss 3.7709   LearningRate 0.0079   Epoch: 14   Global Step: 178460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:37:25,592-Speed 3042.85 samples/sec   Loss 3.7554   LearningRate 0.0079   Epoch: 14   Global Step: 178470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:37:28,986-Speed 3018.02 samples/sec   Loss 3.8565   LearningRate 0.0079   Epoch: 14   Global Step: 178480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:37:32,373-Speed 3023.66 samples/sec   Loss 3.7635   LearningRate 0.0079   Epoch: 14   Global Step: 178490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:37:35,765-Speed 3019.37 samples/sec   Loss 3.7580   LearningRate 0.0079   Epoch: 14   Global Step: 178500   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:37:39,159-Speed 3018.12 samples/sec   Loss 3.7520   LearningRate 0.0079   Epoch: 14   Global Step: 178510   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:37:42,519-Speed 3049.15 samples/sec   Loss 3.7361   LearningRate 0.0079   Epoch: 14   Global Step: 178520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:37:45,874-Speed 3053.14 samples/sec   Loss 3.8037   LearningRate 0.0079   Epoch: 14   Global Step: 178530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:37:49,204-Speed 3075.78 samples/sec   Loss 3.8248   LearningRate 0.0079   Epoch: 14   Global Step: 178540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:37:52,591-Speed 3023.67 samples/sec   Loss 3.7857   LearningRate 0.0079   Epoch: 14   Global Step: 178550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:37:55,940-Speed 3058.39 samples/sec   Loss 3.7271   LearningRate 0.0079   Epoch: 14   Global Step: 178560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:37:59,269-Speed 3077.77 samples/sec   Loss 3.7649   LearningRate 0.0079   Epoch: 14   Global Step: 178570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:38:02,641-Speed 3038.00 samples/sec   Loss 3.7137   LearningRate 0.0079   Epoch: 14   Global Step: 178580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:38:06,009-Speed 3041.22 samples/sec   Loss 3.7584   LearningRate 0.0079   Epoch: 14   Global Step: 178590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:38:09,396-Speed 3023.64 samples/sec   Loss 3.7646   LearningRate 0.0079   Epoch: 14   Global Step: 178600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:38:12,703-Speed 3097.33 samples/sec   Loss 3.7349   LearningRate 0.0079   Epoch: 14   Global Step: 178610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:38:16,026-Speed 3082.67 samples/sec   Loss 3.7840   LearningRate 0.0079   Epoch: 14   Global Step: 178620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:38:19,503-Speed 2945.20 samples/sec   Loss 3.7730   LearningRate 0.0079   Epoch: 14   Global Step: 178630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:38:22,849-Speed 3061.14 samples/sec   Loss 3.8323   LearningRate 0.0079   Epoch: 14   Global Step: 178640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:38:26,227-Speed 3032.92 samples/sec   Loss 3.7272   LearningRate 0.0079   Epoch: 14   Global Step: 178650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:38:29,565-Speed 3068.10 samples/sec   Loss 3.7733   LearningRate 0.0079   Epoch: 14   Global Step: 178660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:38:32,947-Speed 3028.35 samples/sec   Loss 3.6748   LearningRate 0.0079   Epoch: 14   Global Step: 178670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:38:36,333-Speed 3024.85 samples/sec   Loss 3.7495   LearningRate 0.0079   Epoch: 14   Global Step: 178680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:38:39,672-Speed 3068.05 samples/sec   Loss 3.8217   LearningRate 0.0079   Epoch: 14   Global Step: 178690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:38:43,061-Speed 3021.97 samples/sec   Loss 3.8015   LearningRate 0.0079   Epoch: 14   Global Step: 178700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:38:46,417-Speed 3052.49 samples/sec   Loss 3.7739   LearningRate 0.0079   Epoch: 14   Global Step: 178710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:38:49,812-Speed 3016.21 samples/sec   Loss 3.7033   LearningRate 0.0079   Epoch: 14   Global Step: 178720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:38:53,144-Speed 3074.12 samples/sec   Loss 3.8389   LearningRate 0.0079   Epoch: 14   Global Step: 178730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:38:56,509-Speed 3044.40 samples/sec   Loss 3.7141   LearningRate 0.0079   Epoch: 14   Global Step: 178740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:38:59,842-Speed 3072.39 samples/sec   Loss 3.7742   LearningRate 0.0079   Epoch: 14   Global Step: 178750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:39:03,157-Speed 3089.93 samples/sec   Loss 3.8081   LearningRate 0.0079   Epoch: 14   Global Step: 178760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:39:06,479-Speed 3083.40 samples/sec   Loss 3.8728   LearningRate 0.0079   Epoch: 14   Global Step: 178770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:39:09,943-Speed 2956.78 samples/sec   Loss 3.7501   LearningRate 0.0079   Epoch: 14   Global Step: 178780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:39:13,314-Speed 3038.71 samples/sec   Loss 3.7410   LearningRate 0.0079   Epoch: 14   Global Step: 178790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:39:16,617-Speed 3101.55 samples/sec   Loss 3.8342   LearningRate 0.0079   Epoch: 14   Global Step: 178800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:39:20,051-Speed 2982.08 samples/sec   Loss 3.6917   LearningRate 0.0079   Epoch: 14   Global Step: 178810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:39:23,505-Speed 2965.70 samples/sec   Loss 3.7788   LearningRate 0.0078   Epoch: 14   Global Step: 178820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:39:26,863-Speed 3050.00 samples/sec   Loss 3.7778   LearningRate 0.0078   Epoch: 14   Global Step: 178830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:39:30,240-Speed 3033.21 samples/sec   Loss 3.6494   LearningRate 0.0078   Epoch: 14   Global Step: 178840   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:39:33,618-Speed 3032.40 samples/sec   Loss 3.7966   LearningRate 0.0078   Epoch: 14   Global Step: 178850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:39:36,974-Speed 3052.33 samples/sec   Loss 3.8853   LearningRate 0.0078   Epoch: 14   Global Step: 178860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:39:40,405-Speed 2984.53 samples/sec   Loss 3.7332   LearningRate 0.0078   Epoch: 14   Global Step: 178870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:39:43,793-Speed 3023.04 samples/sec   Loss 3.8546   LearningRate 0.0078   Epoch: 14   Global Step: 178880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:39:47,205-Speed 3002.40 samples/sec   Loss 3.7548   LearningRate 0.0078   Epoch: 14   Global Step: 178890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:39:50,565-Speed 3048.30 samples/sec   Loss 3.7768   LearningRate 0.0078   Epoch: 14   Global Step: 178900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:39:53,971-Speed 3007.20 samples/sec   Loss 3.7747   LearningRate 0.0078   Epoch: 14   Global Step: 178910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:39:57,384-Speed 3001.15 samples/sec   Loss 3.7895   LearningRate 0.0078   Epoch: 14   Global Step: 178920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:40:00,724-Speed 3067.05 samples/sec   Loss 3.8571   LearningRate 0.0078   Epoch: 14   Global Step: 178930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:40:04,070-Speed 3061.29 samples/sec   Loss 3.7576   LearningRate 0.0078   Epoch: 14   Global Step: 178940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:40:07,403-Speed 3072.49 samples/sec   Loss 3.8110   LearningRate 0.0078   Epoch: 14   Global Step: 178950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:40:10,833-Speed 2986.23 samples/sec   Loss 3.7378   LearningRate 0.0078   Epoch: 14   Global Step: 178960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:40:14,165-Speed 3074.72 samples/sec   Loss 3.7952   LearningRate 0.0078   Epoch: 14   Global Step: 178970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:40:17,513-Speed 3059.18 samples/sec   Loss 3.7570   LearningRate 0.0078   Epoch: 14   Global Step: 178980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:40:20,904-Speed 3020.14 samples/sec   Loss 3.7153   LearningRate 0.0078   Epoch: 14   Global Step: 178990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:40:24,237-Speed 3073.55 samples/sec   Loss 3.8232   LearningRate 0.0078   Epoch: 14   Global Step: 179000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:40:27,617-Speed 3030.41 samples/sec   Loss 3.8494   LearningRate 0.0078   Epoch: 14   Global Step: 179010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:40:31,009-Speed 3020.17 samples/sec   Loss 3.8173   LearningRate 0.0078   Epoch: 14   Global Step: 179020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:40:34,529-Speed 2910.23 samples/sec   Loss 3.7534   LearningRate 0.0078   Epoch: 14   Global Step: 179030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:40:37,871-Speed 3064.20 samples/sec   Loss 3.6976   LearningRate 0.0078   Epoch: 14   Global Step: 179040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:40:41,213-Speed 3064.71 samples/sec   Loss 3.6885   LearningRate 0.0078   Epoch: 14   Global Step: 179050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:40:44,547-Speed 3072.39 samples/sec   Loss 3.8557   LearningRate 0.0078   Epoch: 14   Global Step: 179060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:40:47,923-Speed 3034.30 samples/sec   Loss 3.8402   LearningRate 0.0078   Epoch: 14   Global Step: 179070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:40:51,296-Speed 3036.80 samples/sec   Loss 3.7026   LearningRate 0.0078   Epoch: 14   Global Step: 179080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:40:54,721-Speed 2990.18 samples/sec   Loss 3.7300   LearningRate 0.0078   Epoch: 14   Global Step: 179090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:40:58,091-Speed 3039.23 samples/sec   Loss 3.7835   LearningRate 0.0078   Epoch: 14   Global Step: 179100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 18:41:01,455-Speed 3045.15 samples/sec   Loss 3.7854   LearningRate 0.0078   Epoch: 14   Global Step: 179110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:41:04,776-Speed 3084.12 samples/sec   Loss 3.7713   LearningRate 0.0078   Epoch: 14   Global Step: 179120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:41:08,130-Speed 3054.17 samples/sec   Loss 3.7810   LearningRate 0.0078   Epoch: 14   Global Step: 179130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:41:11,448-Speed 3086.82 samples/sec   Loss 3.7837   LearningRate 0.0078   Epoch: 14   Global Step: 179140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:41:14,793-Speed 3061.51 samples/sec   Loss 3.8474   LearningRate 0.0078   Epoch: 14   Global Step: 179150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:41:18,171-Speed 3032.34 samples/sec   Loss 3.7342   LearningRate 0.0078   Epoch: 14   Global Step: 179160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:41:21,547-Speed 3034.17 samples/sec   Loss 3.8545   LearningRate 0.0078   Epoch: 14   Global Step: 179170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:41:24,972-Speed 2990.07 samples/sec   Loss 3.7700   LearningRate 0.0078   Epoch: 14   Global Step: 179180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:41:28,414-Speed 2976.09 samples/sec   Loss 3.8307   LearningRate 0.0078   Epoch: 14   Global Step: 179190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:41:31,764-Speed 3057.38 samples/sec   Loss 3.7315   LearningRate 0.0078   Epoch: 14   Global Step: 179200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:41:35,117-Speed 3055.20 samples/sec   Loss 3.7265   LearningRate 0.0078   Epoch: 14   Global Step: 179210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:41:38,453-Speed 3070.63 samples/sec   Loss 3.8436   LearningRate 0.0078   Epoch: 14   Global Step: 179220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:41:41,824-Speed 3037.76 samples/sec   Loss 3.8577   LearningRate 0.0078   Epoch: 14   Global Step: 179230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:41:45,155-Speed 3074.84 samples/sec   Loss 3.8389   LearningRate 0.0078   Epoch: 14   Global Step: 179240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:41:48,467-Speed 3092.81 samples/sec   Loss 3.8662   LearningRate 0.0078   Epoch: 14   Global Step: 179250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:41:51,790-Speed 3082.74 samples/sec   Loss 3.7185   LearningRate 0.0078   Epoch: 14   Global Step: 179260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:41:55,158-Speed 3040.48 samples/sec   Loss 3.7281   LearningRate 0.0077   Epoch: 14   Global Step: 179270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:41:58,591-Speed 2984.26 samples/sec   Loss 3.7683   LearningRate 0.0077   Epoch: 14   Global Step: 179280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:01,977-Speed 3024.74 samples/sec   Loss 3.7953   LearningRate 0.0077   Epoch: 14   Global Step: 179290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:05,340-Speed 3045.76 samples/sec   Loss 3.7809   LearningRate 0.0077   Epoch: 14   Global Step: 179300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:08,663-Speed 3082.62 samples/sec   Loss 3.7395   LearningRate 0.0077   Epoch: 14   Global Step: 179310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:12,014-Speed 3055.88 samples/sec   Loss 3.7032   LearningRate 0.0077   Epoch: 14   Global Step: 179320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:15,412-Speed 3014.50 samples/sec   Loss 3.8021   LearningRate 0.0077   Epoch: 14   Global Step: 179330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:18,798-Speed 3025.12 samples/sec   Loss 3.7645   LearningRate 0.0077   Epoch: 14   Global Step: 179340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:42:22,125-Speed 3078.40 samples/sec   Loss 3.8489   LearningRate 0.0077   Epoch: 14   Global Step: 179350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:25,467-Speed 3064.87 samples/sec   Loss 3.8013   LearningRate 0.0077   Epoch: 14   Global Step: 179360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:28,804-Speed 3069.70 samples/sec   Loss 3.7719   LearningRate 0.0077   Epoch: 14   Global Step: 179370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:32,177-Speed 3036.71 samples/sec   Loss 3.7452   LearningRate 0.0077   Epoch: 14   Global Step: 179380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:35,503-Speed 3079.87 samples/sec   Loss 3.7513   LearningRate 0.0077   Epoch: 14   Global Step: 179390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:38,867-Speed 3045.08 samples/sec   Loss 3.8895   LearningRate 0.0077   Epoch: 14   Global Step: 179400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:42,193-Speed 3079.17 samples/sec   Loss 3.8034   LearningRate 0.0077   Epoch: 14   Global Step: 179410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:45,645-Speed 2967.09 samples/sec   Loss 3.6867   LearningRate 0.0077   Epoch: 14   Global Step: 179420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:49,083-Speed 2979.51 samples/sec   Loss 3.9076   LearningRate 0.0077   Epoch: 14   Global Step: 179430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:52,482-Speed 3013.50 samples/sec   Loss 3.8369   LearningRate 0.0077   Epoch: 14   Global Step: 179440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:42:55,855-Speed 3036.66 samples/sec   Loss 3.6731   LearningRate 0.0077   Epoch: 14   Global Step: 179450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:42:59,227-Speed 3037.39 samples/sec   Loss 3.7029   LearningRate 0.0077   Epoch: 14   Global Step: 179460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:43:02,570-Speed 3064.88 samples/sec   Loss 3.8273   LearningRate 0.0077   Epoch: 14   Global Step: 179470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:43:05,970-Speed 3013.63 samples/sec   Loss 3.7687   LearningRate 0.0077   Epoch: 14   Global Step: 179480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:43:09,382-Speed 3002.17 samples/sec   Loss 3.8914   LearningRate 0.0077   Epoch: 14   Global Step: 179490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:43:12,707-Speed 3079.80 samples/sec   Loss 3.7307   LearningRate 0.0077   Epoch: 14   Global Step: 179500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:43:16,113-Speed 3007.46 samples/sec   Loss 3.8048   LearningRate 0.0077   Epoch: 14   Global Step: 179510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:43:19,567-Speed 2966.32 samples/sec   Loss 3.7205   LearningRate 0.0077   Epoch: 14   Global Step: 179520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:43:22,972-Speed 3007.94 samples/sec   Loss 3.8604   LearningRate 0.0077   Epoch: 14   Global Step: 179530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:43:26,352-Speed 3030.56 samples/sec   Loss 3.8053   LearningRate 0.0077   Epoch: 14   Global Step: 179540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:43:29,720-Speed 3041.29 samples/sec   Loss 3.7920   LearningRate 0.0077   Epoch: 14   Global Step: 179550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:43:33,123-Speed 3009.41 samples/sec   Loss 3.7262   LearningRate 0.0077   Epoch: 14   Global Step: 179560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:43:36,440-Speed 3088.51 samples/sec   Loss 3.8861   LearningRate 0.0077   Epoch: 14   Global Step: 179570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:43:39,833-Speed 3019.07 samples/sec   Loss 3.7416   LearningRate 0.0077   Epoch: 14   Global Step: 179580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 18:43:43,209-Speed 3034.11 samples/sec   Loss 3.7620   LearningRate 0.0077   Epoch: 14   Global Step: 179590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 18:43:46,610-Speed 3011.02 samples/sec   Loss 3.7690   LearningRate 0.0077   Epoch: 14   Global Step: 179600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:43:50,005-Speed 3017.38 samples/sec   Loss 3.7666   LearningRate 0.0077   Epoch: 14   Global Step: 179610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:43:53,386-Speed 3029.08 samples/sec   Loss 3.7774   LearningRate 0.0077   Epoch: 14   Global Step: 179620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:43:56,763-Speed 3033.10 samples/sec   Loss 3.8668   LearningRate 0.0077   Epoch: 14   Global Step: 179630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:00,114-Speed 3056.69 samples/sec   Loss 3.8580   LearningRate 0.0077   Epoch: 14   Global Step: 179640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:03,534-Speed 2996.11 samples/sec   Loss 3.7332   LearningRate 0.0077   Epoch: 14   Global Step: 179650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:06,955-Speed 2993.94 samples/sec   Loss 3.8313   LearningRate 0.0077   Epoch: 14   Global Step: 179660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:10,452-Speed 2928.84 samples/sec   Loss 3.8014   LearningRate 0.0077   Epoch: 14   Global Step: 179670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:13,850-Speed 3014.78 samples/sec   Loss 3.7981   LearningRate 0.0077   Epoch: 14   Global Step: 179680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:17,256-Speed 3007.30 samples/sec   Loss 3.8044   LearningRate 0.0077   Epoch: 14   Global Step: 179690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:44:20,653-Speed 3015.54 samples/sec   Loss 3.8146   LearningRate 0.0077   Epoch: 14   Global Step: 179700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:44:24,042-Speed 3022.30 samples/sec   Loss 3.7681   LearningRate 0.0077   Epoch: 14   Global Step: 179710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:27,478-Speed 2980.50 samples/sec   Loss 3.8260   LearningRate 0.0076   Epoch: 14   Global Step: 179720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:30,898-Speed 2995.40 samples/sec   Loss 3.7947   LearningRate 0.0076   Epoch: 14   Global Step: 179730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:34,293-Speed 3017.30 samples/sec   Loss 3.7394   LearningRate 0.0076   Epoch: 14   Global Step: 179740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:37,686-Speed 3018.63 samples/sec   Loss 3.8153   LearningRate 0.0076   Epoch: 14   Global Step: 179750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:41,123-Speed 2979.60 samples/sec   Loss 3.8087   LearningRate 0.0076   Epoch: 14   Global Step: 179760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:44,564-Speed 2976.90 samples/sec   Loss 3.7771   LearningRate 0.0076   Epoch: 14   Global Step: 179770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:47,928-Speed 3045.12 samples/sec   Loss 3.8985   LearningRate 0.0076   Epoch: 14   Global Step: 179780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:51,244-Speed 3088.28 samples/sec   Loss 3.7888   LearningRate 0.0076   Epoch: 14   Global Step: 179790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:54,619-Speed 3034.81 samples/sec   Loss 3.7879   LearningRate 0.0076   Epoch: 14   Global Step: 179800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:44:58,032-Speed 3001.89 samples/sec   Loss 3.8246   LearningRate 0.0076   Epoch: 14   Global Step: 179810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:45:01,384-Speed 3055.66 samples/sec   Loss 3.8297   LearningRate 0.0076   Epoch: 14   Global Step: 179820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:45:04,801-Speed 2997.29 samples/sec   Loss 3.8225   LearningRate 0.0076   Epoch: 14   Global Step: 179830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:45:08,178-Speed 3033.06 samples/sec   Loss 3.7580   LearningRate 0.0076   Epoch: 14   Global Step: 179840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:45:11,575-Speed 3015.46 samples/sec   Loss 3.7891   LearningRate 0.0076   Epoch: 14   Global Step: 179850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:45:14,968-Speed 3018.99 samples/sec   Loss 3.8007   LearningRate 0.0076   Epoch: 14   Global Step: 179860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:45:18,330-Speed 3046.31 samples/sec   Loss 3.7846   LearningRate 0.0076   Epoch: 14   Global Step: 179870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:45:21,647-Speed 3088.48 samples/sec   Loss 3.7598   LearningRate 0.0076   Epoch: 14   Global Step: 179880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:45:25,067-Speed 2994.35 samples/sec   Loss 3.8482   LearningRate 0.0076   Epoch: 14   Global Step: 179890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:45:28,529-Speed 2959.30 samples/sec   Loss 3.8504   LearningRate 0.0076   Epoch: 14   Global Step: 179900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:45:32,005-Speed 2946.71 samples/sec   Loss 3.8575   LearningRate 0.0076   Epoch: 14   Global Step: 179910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:45:35,337-Speed 3073.83 samples/sec   Loss 3.7530   LearningRate 0.0076   Epoch: 14   Global Step: 179920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:45:38,706-Speed 3040.87 samples/sec   Loss 3.7977   LearningRate 0.0076   Epoch: 14   Global Step: 179930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:45:42,041-Speed 3070.55 samples/sec   Loss 3.8136   LearningRate 0.0076   Epoch: 14   Global Step: 179940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:45:45,392-Speed 3056.86 samples/sec   Loss 3.8067   LearningRate 0.0076   Epoch: 14   Global Step: 179950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:45:48,808-Speed 2998.27 samples/sec   Loss 3.9815   LearningRate 0.0076   Epoch: 14   Global Step: 179960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:45:52,249-Speed 2977.24 samples/sec   Loss 3.8097   LearningRate 0.0076   Epoch: 14   Global Step: 179970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:45:55,601-Speed 3055.67 samples/sec   Loss 3.7939   LearningRate 0.0076   Epoch: 14   Global Step: 179980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:45:58,965-Speed 3044.82 samples/sec   Loss 3.9180   LearningRate 0.0076   Epoch: 14   Global Step: 179990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:02,349-Speed 3027.14 samples/sec   Loss 3.8647   LearningRate 0.0076   Epoch: 14   Global Step: 180000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:05,739-Speed 3021.05 samples/sec   Loss 3.7974   LearningRate 0.0076   Epoch: 14   Global Step: 180010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:09,096-Speed 3051.49 samples/sec   Loss 3.8220   LearningRate 0.0076   Epoch: 14   Global Step: 180020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:12,535-Speed 2978.23 samples/sec   Loss 3.7914   LearningRate 0.0076   Epoch: 14   Global Step: 180030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:16,029-Speed 2931.49 samples/sec   Loss 3.9199   LearningRate 0.0076   Epoch: 14   Global Step: 180040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:19,412-Speed 3027.90 samples/sec   Loss 3.8481   LearningRate 0.0076   Epoch: 14   Global Step: 180050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:22,726-Speed 3091.04 samples/sec   Loss 3.8450   LearningRate 0.0076   Epoch: 14   Global Step: 180060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:26,134-Speed 3005.29 samples/sec   Loss 3.6889   LearningRate 0.0076   Epoch: 14   Global Step: 180070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:29,477-Speed 3063.64 samples/sec   Loss 3.7927   LearningRate 0.0076   Epoch: 14   Global Step: 180080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:32,833-Speed 3052.78 samples/sec   Loss 3.7322   LearningRate 0.0076   Epoch: 14   Global Step: 180090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:36,210-Speed 3032.80 samples/sec   Loss 3.8396   LearningRate 0.0076   Epoch: 14   Global Step: 180100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:39,606-Speed 3016.44 samples/sec   Loss 3.8257   LearningRate 0.0076   Epoch: 14   Global Step: 180110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:42,979-Speed 3036.57 samples/sec   Loss 3.9163   LearningRate 0.0076   Epoch: 14   Global Step: 180120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:46,435-Speed 2963.65 samples/sec   Loss 3.7648   LearningRate 0.0076   Epoch: 14   Global Step: 180130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:49,790-Speed 3053.00 samples/sec   Loss 3.7577   LearningRate 0.0076   Epoch: 14   Global Step: 180140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:53,182-Speed 3020.23 samples/sec   Loss 3.7322   LearningRate 0.0076   Epoch: 14   Global Step: 180150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:56,560-Speed 3032.02 samples/sec   Loss 3.8762   LearningRate 0.0076   Epoch: 14   Global Step: 180160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:46:59,952-Speed 3020.20 samples/sec   Loss 3.9180   LearningRate 0.0075   Epoch: 14   Global Step: 180170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:47:03,375-Speed 2992.51 samples/sec   Loss 3.8097   LearningRate 0.0075   Epoch: 14   Global Step: 180180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:47:06,789-Speed 2999.94 samples/sec   Loss 3.8855   LearningRate 0.0075   Epoch: 14   Global Step: 180190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:47:10,253-Speed 2956.75 samples/sec   Loss 3.7909   LearningRate 0.0075   Epoch: 14   Global Step: 180200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:47:13,609-Speed 3052.25 samples/sec   Loss 3.9014   LearningRate 0.0075   Epoch: 14   Global Step: 180210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:47:16,966-Speed 3051.32 samples/sec   Loss 3.8079   LearningRate 0.0075   Epoch: 14   Global Step: 180220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:47:20,347-Speed 3029.22 samples/sec   Loss 3.9415   LearningRate 0.0075   Epoch: 14   Global Step: 180230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:47:23,827-Speed 2943.23 samples/sec   Loss 3.9097   LearningRate 0.0075   Epoch: 14   Global Step: 180240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:47:27,244-Speed 2997.39 samples/sec   Loss 3.8222   LearningRate 0.0075   Epoch: 14   Global Step: 180250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:47:30,591-Speed 3060.71 samples/sec   Loss 3.8298   LearningRate 0.0075   Epoch: 14   Global Step: 180260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:47:33,970-Speed 3031.13 samples/sec   Loss 3.8217   LearningRate 0.0075   Epoch: 14   Global Step: 180270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:47:37,301-Speed 3075.40 samples/sec   Loss 3.7461   LearningRate 0.0075   Epoch: 14   Global Step: 180280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:47:40,727-Speed 2989.94 samples/sec   Loss 3.8721   LearningRate 0.0075   Epoch: 14   Global Step: 180290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:47:44,133-Speed 3007.33 samples/sec   Loss 3.8295   LearningRate 0.0075   Epoch: 14   Global Step: 180300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:47:47,552-Speed 2995.10 samples/sec   Loss 3.9276   LearningRate 0.0075   Epoch: 14   Global Step: 180310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:47:50,976-Speed 2992.13 samples/sec   Loss 3.7979   LearningRate 0.0075   Epoch: 14   Global Step: 180320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:47:54,393-Speed 2997.84 samples/sec   Loss 3.8777   LearningRate 0.0075   Epoch: 14   Global Step: 180330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:47:57,810-Speed 2997.72 samples/sec   Loss 4.0004   LearningRate 0.0075   Epoch: 14   Global Step: 180340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:48:01,199-Speed 3021.92 samples/sec   Loss 3.9074   LearningRate 0.0075   Epoch: 14   Global Step: 180350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:48:04,550-Speed 3056.71 samples/sec   Loss 3.8353   LearningRate 0.0075   Epoch: 14   Global Step: 180360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:48:07,889-Speed 3067.49 samples/sec   Loss 3.9097   LearningRate 0.0075   Epoch: 14   Global Step: 180370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:48:11,243-Speed 3053.97 samples/sec   Loss 3.7003   LearningRate 0.0075   Epoch: 14   Global Step: 180380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:48:14,601-Speed 3050.44 samples/sec   Loss 3.8941   LearningRate 0.0075   Epoch: 14   Global Step: 180390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:48:18,065-Speed 2956.80 samples/sec   Loss 3.8731   LearningRate 0.0075   Epoch: 14   Global Step: 180400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:48:21,531-Speed 2954.91 samples/sec   Loss 3.8482   LearningRate 0.0075   Epoch: 14   Global Step: 180410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:48:24,971-Speed 2978.61 samples/sec   Loss 3.8147   LearningRate 0.0075   Epoch: 14   Global Step: 180420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:48:28,395-Speed 2990.69 samples/sec   Loss 3.8437   LearningRate 0.0075   Epoch: 14   Global Step: 180430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:48:31,822-Speed 2989.09 samples/sec   Loss 3.9193   LearningRate 0.0075   Epoch: 14   Global Step: 180440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:48:35,178-Speed 3051.72 samples/sec   Loss 3.8340   LearningRate 0.0075   Epoch: 14   Global Step: 180450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:48:38,646-Speed 2953.70 samples/sec   Loss 3.7156   LearningRate 0.0075   Epoch: 14   Global Step: 180460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:48:42,064-Speed 2997.12 samples/sec   Loss 3.8208   LearningRate 0.0075   Epoch: 14   Global Step: 180470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:48:45,482-Speed 2996.50 samples/sec   Loss 3.7919   LearningRate 0.0075   Epoch: 14   Global Step: 180480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:48:48,925-Speed 2974.94 samples/sec   Loss 3.8862   LearningRate 0.0075   Epoch: 14   Global Step: 180490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:48:52,310-Speed 3026.01 samples/sec   Loss 3.8443   LearningRate 0.0075   Epoch: 14   Global Step: 180500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:48:55,736-Speed 2990.05 samples/sec   Loss 3.7845   LearningRate 0.0075   Epoch: 14   Global Step: 180510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:48:59,157-Speed 2993.42 samples/sec   Loss 3.8800   LearningRate 0.0075   Epoch: 14   Global Step: 180520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:49:02,562-Speed 3008.61 samples/sec   Loss 3.7901   LearningRate 0.0075   Epoch: 14   Global Step: 180530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:49:05,928-Speed 3043.38 samples/sec   Loss 3.8066   LearningRate 0.0075   Epoch: 14   Global Step: 180540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:49:09,318-Speed 3020.98 samples/sec   Loss 3.8260   LearningRate 0.0075   Epoch: 14   Global Step: 180550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:49:12,629-Speed 3093.77 samples/sec   Loss 3.9058   LearningRate 0.0075   Epoch: 14   Global Step: 180560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:49:15,997-Speed 3040.94 samples/sec   Loss 3.7413   LearningRate 0.0075   Epoch: 14   Global Step: 180570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:49:19,346-Speed 3058.29 samples/sec   Loss 3.8026   LearningRate 0.0075   Epoch: 14   Global Step: 180580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:49:22,783-Speed 2980.89 samples/sec   Loss 3.8827   LearningRate 0.0075   Epoch: 14   Global Step: 180590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:49:26,214-Speed 2984.46 samples/sec   Loss 3.7166   LearningRate 0.0075   Epoch: 14   Global Step: 180600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:49:29,637-Speed 2992.50 samples/sec   Loss 3.9005   LearningRate 0.0075   Epoch: 14   Global Step: 180610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:49:33,036-Speed 3014.09 samples/sec   Loss 3.8259   LearningRate 0.0074   Epoch: 14   Global Step: 180620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:49:36,457-Speed 2993.22 samples/sec   Loss 3.8829   LearningRate 0.0074   Epoch: 14   Global Step: 180630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:49:39,975-Speed 2912.34 samples/sec   Loss 3.8735   LearningRate 0.0074   Epoch: 14   Global Step: 180640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:49:43,446-Speed 2950.97 samples/sec   Loss 3.7401   LearningRate 0.0074   Epoch: 14   Global Step: 180650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:49:46,833-Speed 3024.39 samples/sec   Loss 3.9718   LearningRate 0.0074   Epoch: 14   Global Step: 180660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:49:50,171-Speed 3068.06 samples/sec   Loss 3.7744   LearningRate 0.0074   Epoch: 14   Global Step: 180670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:49:53,645-Speed 2948.76 samples/sec   Loss 3.8868   LearningRate 0.0074   Epoch: 14   Global Step: 180680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:49:57,050-Speed 3007.76 samples/sec   Loss 3.9325   LearningRate 0.0074   Epoch: 14   Global Step: 180690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:50:00,475-Speed 2990.61 samples/sec   Loss 3.8027   LearningRate 0.0074   Epoch: 14   Global Step: 180700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:50:03,867-Speed 3019.78 samples/sec   Loss 3.8524   LearningRate 0.0074   Epoch: 14   Global Step: 180710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:50:07,283-Speed 2998.59 samples/sec   Loss 3.7930   LearningRate 0.0074   Epoch: 14   Global Step: 180720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:50:10,667-Speed 3026.70 samples/sec   Loss 3.8481   LearningRate 0.0074   Epoch: 14   Global Step: 180730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:50:13,973-Speed 3098.12 samples/sec   Loss 3.8484   LearningRate 0.0074   Epoch: 14   Global Step: 180740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:50:17,332-Speed 3048.96 samples/sec   Loss 3.8422   LearningRate 0.0074   Epoch: 14   Global Step: 180750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:50:20,719-Speed 3024.33 samples/sec   Loss 3.7613   LearningRate 0.0074   Epoch: 14   Global Step: 180760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:50:24,146-Speed 2988.83 samples/sec   Loss 3.8791   LearningRate 0.0074   Epoch: 14   Global Step: 180770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:50:27,579-Speed 2983.68 samples/sec   Loss 3.8632   LearningRate 0.0074   Epoch: 14   Global Step: 180780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:50:30,930-Speed 3056.90 samples/sec   Loss 3.8453   LearningRate 0.0074   Epoch: 14   Global Step: 180790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:50:34,364-Speed 2982.64 samples/sec   Loss 3.8740   LearningRate 0.0074   Epoch: 14   Global Step: 180800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:50:37,714-Speed 3057.10 samples/sec   Loss 3.9030   LearningRate 0.0074   Epoch: 14   Global Step: 180810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:50:41,104-Speed 3021.97 samples/sec   Loss 3.8576   LearningRate 0.0074   Epoch: 14   Global Step: 180820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:50:44,504-Speed 3012.84 samples/sec   Loss 3.9316   LearningRate 0.0074   Epoch: 14   Global Step: 180830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:50:47,854-Speed 3057.23 samples/sec   Loss 3.9195   LearningRate 0.0074   Epoch: 14   Global Step: 180840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:50:51,268-Speed 3000.10 samples/sec   Loss 3.8249   LearningRate 0.0074   Epoch: 14   Global Step: 180850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:50:54,639-Speed 3038.77 samples/sec   Loss 3.8997   LearningRate 0.0074   Epoch: 14   Global Step: 180860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:50:57,984-Speed 3062.34 samples/sec   Loss 3.9393   LearningRate 0.0074   Epoch: 14   Global Step: 180870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:51:01,366-Speed 3028.01 samples/sec   Loss 3.7622   LearningRate 0.0074   Epoch: 14   Global Step: 180880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:51:04,745-Speed 3031.42 samples/sec   Loss 3.7346   LearningRate 0.0074   Epoch: 14   Global Step: 180890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:51:08,090-Speed 3062.41 samples/sec   Loss 3.7973   LearningRate 0.0074   Epoch: 14   Global Step: 180900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:51:11,452-Speed 3046.80 samples/sec   Loss 3.7955   LearningRate 0.0074   Epoch: 14   Global Step: 180910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:51:14,895-Speed 2974.78 samples/sec   Loss 3.8909   LearningRate 0.0074   Epoch: 14   Global Step: 180920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:51:18,293-Speed 3014.76 samples/sec   Loss 3.7925   LearningRate 0.0074   Epoch: 14   Global Step: 180930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:51:21,706-Speed 3000.62 samples/sec   Loss 3.8124   LearningRate 0.0074   Epoch: 14   Global Step: 180940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:51:25,080-Speed 3036.12 samples/sec   Loss 3.7337   LearningRate 0.0074   Epoch: 14   Global Step: 180950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:51:28,475-Speed 3017.42 samples/sec   Loss 3.8435   LearningRate 0.0074   Epoch: 14   Global Step: 180960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:51:31,820-Speed 3062.21 samples/sec   Loss 3.8093   LearningRate 0.0074   Epoch: 14   Global Step: 180970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:51:35,161-Speed 3066.29 samples/sec   Loss 3.8510   LearningRate 0.0074   Epoch: 14   Global Step: 180980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:51:38,519-Speed 3050.56 samples/sec   Loss 3.7718   LearningRate 0.0074   Epoch: 14   Global Step: 180990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:51:41,847-Speed 3077.54 samples/sec   Loss 3.8000   LearningRate 0.0074   Epoch: 14   Global Step: 181000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:51:45,172-Speed 3080.30 samples/sec   Loss 3.7667   LearningRate 0.0074   Epoch: 14   Global Step: 181010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:51:48,499-Speed 3079.23 samples/sec   Loss 3.7846   LearningRate 0.0074   Epoch: 14   Global Step: 181020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:51:51,850-Speed 3056.91 samples/sec   Loss 3.8365   LearningRate 0.0074   Epoch: 14   Global Step: 181030   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:51:55,219-Speed 3040.24 samples/sec   Loss 3.8547   LearningRate 0.0074   Epoch: 14   Global Step: 181040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:51:58,600-Speed 3029.47 samples/sec   Loss 3.8391   LearningRate 0.0074   Epoch: 14   Global Step: 181050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:52:01,982-Speed 3029.82 samples/sec   Loss 3.8983   LearningRate 0.0074   Epoch: 14   Global Step: 181060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:52:05,412-Speed 2986.46 samples/sec   Loss 3.8819   LearningRate 0.0074   Epoch: 14   Global Step: 181070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:52:08,753-Speed 3066.71 samples/sec   Loss 3.7908   LearningRate 0.0073   Epoch: 14   Global Step: 181080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:52:12,122-Speed 3040.06 samples/sec   Loss 3.8503   LearningRate 0.0073   Epoch: 14   Global Step: 181090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:52:15,492-Speed 3039.89 samples/sec   Loss 3.9363   LearningRate 0.0073   Epoch: 14   Global Step: 181100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:52:18,861-Speed 3040.34 samples/sec   Loss 3.8478   LearningRate 0.0073   Epoch: 14   Global Step: 181110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:52:22,253-Speed 3019.30 samples/sec   Loss 3.8104   LearningRate 0.0073   Epoch: 14   Global Step: 181120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:52:25,585-Speed 3074.25 samples/sec   Loss 3.9628   LearningRate 0.0073   Epoch: 14   Global Step: 181130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:52:29,002-Speed 2997.54 samples/sec   Loss 3.8416   LearningRate 0.0073   Epoch: 14   Global Step: 181140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:52:32,403-Speed 3011.49 samples/sec   Loss 3.7971   LearningRate 0.0073   Epoch: 14   Global Step: 181150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:52:35,791-Speed 3023.80 samples/sec   Loss 3.8850   LearningRate 0.0073   Epoch: 14   Global Step: 181160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:52:39,189-Speed 3014.71 samples/sec   Loss 3.8146   LearningRate 0.0073   Epoch: 14   Global Step: 181170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:52:42,548-Speed 3049.28 samples/sec   Loss 3.8573   LearningRate 0.0073   Epoch: 14   Global Step: 181180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:52:45,911-Speed 3045.63 samples/sec   Loss 3.8933   LearningRate 0.0073   Epoch: 14   Global Step: 181190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:52:49,283-Speed 3037.59 samples/sec   Loss 3.8663   LearningRate 0.0073   Epoch: 14   Global Step: 181200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:52:52,716-Speed 2983.40 samples/sec   Loss 3.7668   LearningRate 0.0073   Epoch: 14   Global Step: 181210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:52:56,046-Speed 3076.23 samples/sec   Loss 3.8547   LearningRate 0.0073   Epoch: 14   Global Step: 181220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:52:59,407-Speed 3047.81 samples/sec   Loss 3.9038   LearningRate 0.0073   Epoch: 14   Global Step: 181230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:53:02,823-Speed 2998.35 samples/sec   Loss 3.8135   LearningRate 0.0073   Epoch: 14   Global Step: 181240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:53:06,186-Speed 3044.96 samples/sec   Loss 3.8259   LearningRate 0.0073   Epoch: 14   Global Step: 181250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:53:09,581-Speed 3017.46 samples/sec   Loss 3.8982   LearningRate 0.0073   Epoch: 14   Global Step: 181260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:53:12,929-Speed 3059.32 samples/sec   Loss 3.8686   LearningRate 0.0073   Epoch: 14   Global Step: 181270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:53:16,334-Speed 3008.44 samples/sec   Loss 3.8917   LearningRate 0.0073   Epoch: 14   Global Step: 181280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:53:19,701-Speed 3042.74 samples/sec   Loss 4.0029   LearningRate 0.0073   Epoch: 14   Global Step: 181290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:53:23,046-Speed 3062.01 samples/sec   Loss 3.7996   LearningRate 0.0073   Epoch: 14   Global Step: 181300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:53:26,364-Speed 3087.08 samples/sec   Loss 3.7948   LearningRate 0.0073   Epoch: 14   Global Step: 181310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:53:29,741-Speed 3033.21 samples/sec   Loss 3.7668   LearningRate 0.0073   Epoch: 14   Global Step: 181320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:53:33,139-Speed 3014.88 samples/sec   Loss 3.9027   LearningRate 0.0073   Epoch: 14   Global Step: 181330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:53:36,633-Speed 2930.94 samples/sec   Loss 3.7825   LearningRate 0.0073   Epoch: 14   Global Step: 181340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:53:40,029-Speed 3016.44 samples/sec   Loss 3.8796   LearningRate 0.0073   Epoch: 14   Global Step: 181350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:53:43,418-Speed 3022.52 samples/sec   Loss 3.8772   LearningRate 0.0073   Epoch: 14   Global Step: 181360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:53:46,785-Speed 3041.71 samples/sec   Loss 3.8925   LearningRate 0.0073   Epoch: 14   Global Step: 181370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:53:50,156-Speed 3039.28 samples/sec   Loss 3.7835   LearningRate 0.0073   Epoch: 14   Global Step: 181380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:53:53,542-Speed 3024.57 samples/sec   Loss 3.7764   LearningRate 0.0073   Epoch: 14   Global Step: 181390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:53:56,893-Speed 3056.99 samples/sec   Loss 3.8377   LearningRate 0.0073   Epoch: 14   Global Step: 181400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:54:00,301-Speed 3006.05 samples/sec   Loss 3.8537   LearningRate 0.0073   Epoch: 14   Global Step: 181410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:54:03,663-Speed 3046.97 samples/sec   Loss 3.8749   LearningRate 0.0073   Epoch: 14   Global Step: 181420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:54:07,089-Speed 2989.31 samples/sec   Loss 3.8655   LearningRate 0.0073   Epoch: 14   Global Step: 181430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:54:10,600-Speed 2917.28 samples/sec   Loss 3.9109   LearningRate 0.0073   Epoch: 14   Global Step: 181440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:54:13,996-Speed 3016.09 samples/sec   Loss 3.8710   LearningRate 0.0073   Epoch: 14   Global Step: 181450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:54:17,354-Speed 3050.34 samples/sec   Loss 3.9072   LearningRate 0.0073   Epoch: 14   Global Step: 181460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:54:20,701-Speed 3061.19 samples/sec   Loss 3.8955   LearningRate 0.0073   Epoch: 14   Global Step: 181470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:54:24,132-Speed 2984.80 samples/sec   Loss 3.8356   LearningRate 0.0073   Epoch: 14   Global Step: 181480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:54:27,502-Speed 3039.32 samples/sec   Loss 3.9398   LearningRate 0.0073   Epoch: 14   Global Step: 181490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:54:30,897-Speed 3016.95 samples/sec   Loss 3.8035   LearningRate 0.0073   Epoch: 14   Global Step: 181500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:54:34,279-Speed 3028.77 samples/sec   Loss 3.7515   LearningRate 0.0073   Epoch: 14   Global Step: 181510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:54:37,655-Speed 3034.07 samples/sec   Loss 3.9065   LearningRate 0.0073   Epoch: 14   Global Step: 181520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:54:41,024-Speed 3040.47 samples/sec   Loss 3.8540   LearningRate 0.0073   Epoch: 14   Global Step: 181530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:54:44,449-Speed 2990.61 samples/sec   Loss 3.8312   LearningRate 0.0072   Epoch: 14   Global Step: 181540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:54:47,839-Speed 3021.91 samples/sec   Loss 3.8491   LearningRate 0.0072   Epoch: 14   Global Step: 181550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:54:51,150-Speed 3093.28 samples/sec   Loss 3.7690   LearningRate 0.0072   Epoch: 14   Global Step: 181560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:54:54,522-Speed 3037.62 samples/sec   Loss 3.8495   LearningRate 0.0072   Epoch: 14   Global Step: 181570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:54:57,853-Speed 3074.92 samples/sec   Loss 3.7092   LearningRate 0.0072   Epoch: 14   Global Step: 181580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:55:01,164-Speed 3093.87 samples/sec   Loss 3.8976   LearningRate 0.0072   Epoch: 14   Global Step: 181590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:55:04,531-Speed 3041.77 samples/sec   Loss 3.8134   LearningRate 0.0072   Epoch: 14   Global Step: 181600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:55:07,905-Speed 3036.31 samples/sec   Loss 3.8490   LearningRate 0.0072   Epoch: 14   Global Step: 181610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:55:11,375-Speed 2952.00 samples/sec   Loss 3.8598   LearningRate 0.0072   Epoch: 14   Global Step: 181620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:55:14,740-Speed 3043.92 samples/sec   Loss 3.8377   LearningRate 0.0072   Epoch: 14   Global Step: 181630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:55:18,235-Speed 2930.70 samples/sec   Loss 3.7708   LearningRate 0.0072   Epoch: 14   Global Step: 181640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:55:21,600-Speed 3043.91 samples/sec   Loss 3.8299   LearningRate 0.0072   Epoch: 14   Global Step: 181650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:55:24,990-Speed 3021.67 samples/sec   Loss 3.8221   LearningRate 0.0072   Epoch: 14   Global Step: 181660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:55:28,408-Speed 2996.61 samples/sec   Loss 3.8885   LearningRate 0.0072   Epoch: 14   Global Step: 181670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:55:31,816-Speed 3005.56 samples/sec   Loss 3.7826   LearningRate 0.0072   Epoch: 14   Global Step: 181680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:55:35,172-Speed 3051.49 samples/sec   Loss 3.8625   LearningRate 0.0072   Epoch: 14   Global Step: 181690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:55:38,570-Speed 3014.34 samples/sec   Loss 3.8241   LearningRate 0.0072   Epoch: 14   Global Step: 181700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:55:41,948-Speed 3032.55 samples/sec   Loss 3.7330   LearningRate 0.0072   Epoch: 14   Global Step: 181710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:55:45,442-Speed 2931.93 samples/sec   Loss 3.8200   LearningRate 0.0072   Epoch: 14   Global Step: 181720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:55:48,834-Speed 3019.31 samples/sec   Loss 3.8552   LearningRate 0.0072   Epoch: 14   Global Step: 181730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:55:52,324-Speed 2935.56 samples/sec   Loss 3.8376   LearningRate 0.0072   Epoch: 14   Global Step: 181740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:55:55,711-Speed 3023.50 samples/sec   Loss 3.8380   LearningRate 0.0072   Epoch: 14   Global Step: 181750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:55:59,073-Speed 3046.46 samples/sec   Loss 3.9047   LearningRate 0.0072   Epoch: 14   Global Step: 181760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:56:02,451-Speed 3032.69 samples/sec   Loss 3.8483   LearningRate 0.0072   Epoch: 14   Global Step: 181770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:56:05,856-Speed 3008.33 samples/sec   Loss 3.8237   LearningRate 0.0072   Epoch: 14   Global Step: 181780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:56:09,305-Speed 2969.60 samples/sec   Loss 3.8512   LearningRate 0.0072   Epoch: 14   Global Step: 181790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:56:12,781-Speed 2947.40 samples/sec   Loss 3.8756   LearningRate 0.0072   Epoch: 14   Global Step: 181800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:56:16,102-Speed 3083.89 samples/sec   Loss 3.7846   LearningRate 0.0072   Epoch: 14   Global Step: 181810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:56:19,503-Speed 3011.69 samples/sec   Loss 3.8692   LearningRate 0.0072   Epoch: 14   Global Step: 181820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:56:22,850-Speed 3060.36 samples/sec   Loss 3.9153   LearningRate 0.0072   Epoch: 14   Global Step: 181830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:56:26,274-Speed 2992.02 samples/sec   Loss 3.8776   LearningRate 0.0072   Epoch: 14   Global Step: 181840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:56:29,615-Speed 3065.15 samples/sec   Loss 3.8512   LearningRate 0.0072   Epoch: 14   Global Step: 181850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:56:32,956-Speed 3066.11 samples/sec   Loss 3.8787   LearningRate 0.0072   Epoch: 14   Global Step: 181860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:56:36,423-Speed 2954.47 samples/sec   Loss 3.8824   LearningRate 0.0072   Epoch: 14   Global Step: 181870   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:56:39,793-Speed 3039.18 samples/sec   Loss 3.8790   LearningRate 0.0072   Epoch: 14   Global Step: 181880   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:56:43,124-Speed 3075.58 samples/sec   Loss 3.8753   LearningRate 0.0072   Epoch: 14   Global Step: 181890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:56:46,547-Speed 2992.15 samples/sec   Loss 3.8943   LearningRate 0.0072   Epoch: 14   Global Step: 181900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:56:49,892-Speed 3063.05 samples/sec   Loss 3.8688   LearningRate 0.0072   Epoch: 14   Global Step: 181910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:56:53,263-Speed 3038.42 samples/sec   Loss 3.7868   LearningRate 0.0072   Epoch: 14   Global Step: 181920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:56:56,568-Speed 3098.62 samples/sec   Loss 3.9332   LearningRate 0.0072   Epoch: 14   Global Step: 181930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:56:59,871-Speed 3100.64 samples/sec   Loss 3.9109   LearningRate 0.0072   Epoch: 14   Global Step: 181940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:03,287-Speed 2999.33 samples/sec   Loss 3.8482   LearningRate 0.0072   Epoch: 14   Global Step: 181950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:06,786-Speed 2927.14 samples/sec   Loss 3.9380   LearningRate 0.0072   Epoch: 14   Global Step: 181960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:10,172-Speed 3025.76 samples/sec   Loss 3.8541   LearningRate 0.0072   Epoch: 14   Global Step: 181970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:13,566-Speed 3017.77 samples/sec   Loss 3.7033   LearningRate 0.0072   Epoch: 14   Global Step: 181980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:16,910-Speed 3062.62 samples/sec   Loss 3.9132   LearningRate 0.0072   Epoch: 14   Global Step: 181990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:20,263-Speed 3054.95 samples/sec   Loss 3.8173   LearningRate 0.0071   Epoch: 14   Global Step: 182000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:23,601-Speed 3068.78 samples/sec   Loss 3.9388   LearningRate 0.0071   Epoch: 14   Global Step: 182010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:27,063-Speed 2958.38 samples/sec   Loss 3.8462   LearningRate 0.0071   Epoch: 14   Global Step: 182020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:30,455-Speed 3019.28 samples/sec   Loss 3.9260   LearningRate 0.0071   Epoch: 14   Global Step: 182030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:33,781-Speed 3080.09 samples/sec   Loss 3.8798   LearningRate 0.0071   Epoch: 14   Global Step: 182040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:37,147-Speed 3042.65 samples/sec   Loss 3.8145   LearningRate 0.0071   Epoch: 14   Global Step: 182050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:40,457-Speed 3094.89 samples/sec   Loss 3.8500   LearningRate 0.0071   Epoch: 14   Global Step: 182060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:43,822-Speed 3043.36 samples/sec   Loss 3.9110   LearningRate 0.0071   Epoch: 14   Global Step: 182070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:47,129-Speed 3097.77 samples/sec   Loss 3.8276   LearningRate 0.0071   Epoch: 14   Global Step: 182080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:50,471-Speed 3064.71 samples/sec   Loss 3.9180   LearningRate 0.0071   Epoch: 14   Global Step: 182090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:53,874-Speed 3010.23 samples/sec   Loss 3.8305   LearningRate 0.0071   Epoch: 14   Global Step: 182100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:57:57,290-Speed 2997.65 samples/sec   Loss 3.7933   LearningRate 0.0071   Epoch: 14   Global Step: 182110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:58:00,639-Speed 3058.93 samples/sec   Loss 3.8173   LearningRate 0.0071   Epoch: 14   Global Step: 182120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:58:04,024-Speed 3026.22 samples/sec   Loss 3.7803   LearningRate 0.0071   Epoch: 14   Global Step: 182130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:58:07,438-Speed 3000.41 samples/sec   Loss 3.8584   LearningRate 0.0071   Epoch: 14   Global Step: 182140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:58:10,757-Speed 3086.56 samples/sec   Loss 3.9090   LearningRate 0.0071   Epoch: 14   Global Step: 182150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:58:14,142-Speed 3025.35 samples/sec   Loss 3.7602   LearningRate 0.0071   Epoch: 14   Global Step: 182160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:58:17,502-Speed 3048.55 samples/sec   Loss 3.7363   LearningRate 0.0071   Epoch: 14   Global Step: 182170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:58:20,833-Speed 3075.47 samples/sec   Loss 3.8142   LearningRate 0.0071   Epoch: 14   Global Step: 182180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:58:24,176-Speed 3063.39 samples/sec   Loss 3.8441   LearningRate 0.0071   Epoch: 14   Global Step: 182190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:58:27,579-Speed 3009.90 samples/sec   Loss 3.8369   LearningRate 0.0071   Epoch: 14   Global Step: 182200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:58:31,002-Speed 2993.22 samples/sec   Loss 3.7760   LearningRate 0.0071   Epoch: 14   Global Step: 182210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:58:34,343-Speed 3065.80 samples/sec   Loss 3.8365   LearningRate 0.0071   Epoch: 14   Global Step: 182220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:58:37,775-Speed 2984.36 samples/sec   Loss 3.9815   LearningRate 0.0071   Epoch: 14   Global Step: 182230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:58:41,165-Speed 3020.85 samples/sec   Loss 3.8433   LearningRate 0.0071   Epoch: 14   Global Step: 182240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:58:44,533-Speed 3041.77 samples/sec   Loss 3.8656   LearningRate 0.0071   Epoch: 14   Global Step: 182250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:58:47,865-Speed 3074.29 samples/sec   Loss 3.8595   LearningRate 0.0071   Epoch: 14   Global Step: 182260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 18:58:51,252-Speed 3024.24 samples/sec   Loss 3.8383   LearningRate 0.0071   Epoch: 14   Global Step: 182270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:58:54,569-Speed 3087.63 samples/sec   Loss 3.8888   LearningRate 0.0071   Epoch: 14   Global Step: 182280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:58:57,961-Speed 3020.18 samples/sec   Loss 3.7793   LearningRate 0.0071   Epoch: 14   Global Step: 182290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:01,327-Speed 3042.99 samples/sec   Loss 3.8422   LearningRate 0.0071   Epoch: 14   Global Step: 182300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:04,737-Speed 3004.09 samples/sec   Loss 3.8173   LearningRate 0.0071   Epoch: 14   Global Step: 182310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:08,210-Speed 2949.72 samples/sec   Loss 3.8049   LearningRate 0.0071   Epoch: 14   Global Step: 182320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:11,625-Speed 2998.96 samples/sec   Loss 3.8287   LearningRate 0.0071   Epoch: 14   Global Step: 182330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:15,004-Speed 3031.24 samples/sec   Loss 3.8303   LearningRate 0.0071   Epoch: 14   Global Step: 182340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:18,514-Speed 2918.01 samples/sec   Loss 3.8610   LearningRate 0.0071   Epoch: 14   Global Step: 182350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:21,824-Speed 3095.06 samples/sec   Loss 3.7808   LearningRate 0.0071   Epoch: 14   Global Step: 182360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:25,154-Speed 3076.04 samples/sec   Loss 3.7877   LearningRate 0.0071   Epoch: 14   Global Step: 182370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:59:28,501-Speed 3060.11 samples/sec   Loss 3.8961   LearningRate 0.0071   Epoch: 14   Global Step: 182380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:59:31,887-Speed 3024.53 samples/sec   Loss 3.8875   LearningRate 0.0071   Epoch: 14   Global Step: 182390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 18:59:35,288-Speed 3012.54 samples/sec   Loss 3.9040   LearningRate 0.0071   Epoch: 14   Global Step: 182400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:38,661-Speed 3036.61 samples/sec   Loss 3.8549   LearningRate 0.0071   Epoch: 14   Global Step: 182410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:42,043-Speed 3028.66 samples/sec   Loss 3.8397   LearningRate 0.0071   Epoch: 14   Global Step: 182420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:45,382-Speed 3067.81 samples/sec   Loss 3.8754   LearningRate 0.0071   Epoch: 14   Global Step: 182430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:48,757-Speed 3035.36 samples/sec   Loss 3.9450   LearningRate 0.0071   Epoch: 14   Global Step: 182440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:52,102-Speed 3062.45 samples/sec   Loss 3.8182   LearningRate 0.0071   Epoch: 14   Global Step: 182450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:55,544-Speed 2975.95 samples/sec   Loss 3.7240   LearningRate 0.0070   Epoch: 14   Global Step: 182460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 18:59:58,915-Speed 3038.65 samples/sec   Loss 3.8711   LearningRate 0.0070   Epoch: 14   Global Step: 182470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:00:02,288-Speed 3036.20 samples/sec   Loss 3.8781   LearningRate 0.0070   Epoch: 14   Global Step: 182480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:00:05,679-Speed 3020.85 samples/sec   Loss 3.8933   LearningRate 0.0070   Epoch: 14   Global Step: 182490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:00:09,113-Speed 2983.01 samples/sec   Loss 3.8662   LearningRate 0.0070   Epoch: 14   Global Step: 182500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:00:12,642-Speed 2902.46 samples/sec   Loss 3.8406   LearningRate 0.0070   Epoch: 14   Global Step: 182510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:00:16,015-Speed 3036.77 samples/sec   Loss 3.9324   LearningRate 0.0070   Epoch: 14   Global Step: 182520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:00:19,444-Speed 2987.62 samples/sec   Loss 3.8213   LearningRate 0.0070   Epoch: 14   Global Step: 182530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:00:22,890-Speed 2972.34 samples/sec   Loss 3.9242   LearningRate 0.0070   Epoch: 14   Global Step: 182540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:00:26,366-Speed 2947.04 samples/sec   Loss 3.9492   LearningRate 0.0070   Epoch: 14   Global Step: 182550   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:00:29,817-Speed 2967.81 samples/sec   Loss 3.8721   LearningRate 0.0070   Epoch: 14   Global Step: 182560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:00:33,191-Speed 3035.89 samples/sec   Loss 3.8647   LearningRate 0.0070   Epoch: 14   Global Step: 182570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:00:36,605-Speed 3000.52 samples/sec   Loss 3.9270   LearningRate 0.0070   Epoch: 14   Global Step: 182580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:00:39,956-Speed 3057.37 samples/sec   Loss 3.8446   LearningRate 0.0070   Epoch: 14   Global Step: 182590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:00:43,284-Speed 3077.38 samples/sec   Loss 3.8844   LearningRate 0.0070   Epoch: 14   Global Step: 182600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:00:46,677-Speed 3019.37 samples/sec   Loss 3.9411   LearningRate 0.0070   Epoch: 14   Global Step: 182610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:00:50,055-Speed 3032.02 samples/sec   Loss 3.8094   LearningRate 0.0070   Epoch: 14   Global Step: 182620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:00:53,423-Speed 3041.53 samples/sec   Loss 3.8499   LearningRate 0.0070   Epoch: 14   Global Step: 182630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:00:56,770-Speed 3060.91 samples/sec   Loss 3.8916   LearningRate 0.0070   Epoch: 14   Global Step: 182640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:00,120-Speed 3057.58 samples/sec   Loss 3.8839   LearningRate 0.0070   Epoch: 14   Global Step: 182650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:03,527-Speed 3006.55 samples/sec   Loss 3.8359   LearningRate 0.0070   Epoch: 14   Global Step: 182660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:06,935-Speed 3005.47 samples/sec   Loss 3.8594   LearningRate 0.0070   Epoch: 14   Global Step: 182670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:10,341-Speed 3007.08 samples/sec   Loss 3.7980   LearningRate 0.0070   Epoch: 14   Global Step: 182680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:13,765-Speed 2991.23 samples/sec   Loss 3.9573   LearningRate 0.0070   Epoch: 14   Global Step: 182690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:17,149-Speed 3026.87 samples/sec   Loss 3.8111   LearningRate 0.0070   Epoch: 14   Global Step: 182700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:20,500-Speed 3057.06 samples/sec   Loss 3.7589   LearningRate 0.0070   Epoch: 14   Global Step: 182710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:23,885-Speed 3025.72 samples/sec   Loss 3.8115   LearningRate 0.0070   Epoch: 14   Global Step: 182720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:27,283-Speed 3014.33 samples/sec   Loss 3.9053   LearningRate 0.0070   Epoch: 14   Global Step: 182730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:30,710-Speed 2989.25 samples/sec   Loss 3.8871   LearningRate 0.0070   Epoch: 14   Global Step: 182740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:01:34,089-Speed 3031.49 samples/sec   Loss 3.9157   LearningRate 0.0070   Epoch: 14   Global Step: 182750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:37,437-Speed 3059.32 samples/sec   Loss 3.8474   LearningRate 0.0070   Epoch: 14   Global Step: 182760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:40,859-Speed 2993.20 samples/sec   Loss 3.8832   LearningRate 0.0070   Epoch: 14   Global Step: 182770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:44,311-Speed 2967.27 samples/sec   Loss 3.8478   LearningRate 0.0070   Epoch: 14   Global Step: 182780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:47,685-Speed 3035.16 samples/sec   Loss 3.9550   LearningRate 0.0070   Epoch: 14   Global Step: 182790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:51,085-Speed 3012.91 samples/sec   Loss 3.8411   LearningRate 0.0070   Epoch: 14   Global Step: 182800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:54,471-Speed 3024.96 samples/sec   Loss 3.8155   LearningRate 0.0070   Epoch: 14   Global Step: 182810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:01:57,973-Speed 2924.78 samples/sec   Loss 3.8958   LearningRate 0.0070   Epoch: 14   Global Step: 182820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:02:01,372-Speed 3013.73 samples/sec   Loss 3.9065   LearningRate 0.0070   Epoch: 14   Global Step: 182830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:02:04,745-Speed 3036.75 samples/sec   Loss 3.8677   LearningRate 0.0070   Epoch: 14   Global Step: 182840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:02:08,165-Speed 2994.95 samples/sec   Loss 3.9142   LearningRate 0.0070   Epoch: 14   Global Step: 182850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:02:11,592-Speed 2989.36 samples/sec   Loss 3.8381   LearningRate 0.0070   Epoch: 14   Global Step: 182860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:02:14,933-Speed 3065.53 samples/sec   Loss 3.9547   LearningRate 0.0070   Epoch: 14   Global Step: 182870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:02:18,270-Speed 3069.19 samples/sec   Loss 3.7845   LearningRate 0.0070   Epoch: 14   Global Step: 182880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:02:21,730-Speed 2960.57 samples/sec   Loss 3.7809   LearningRate 0.0070   Epoch: 14   Global Step: 182890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:02:25,111-Speed 3029.32 samples/sec   Loss 3.7990   LearningRate 0.0070   Epoch: 14   Global Step: 182900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:02:28,488-Speed 3033.60 samples/sec   Loss 3.8583   LearningRate 0.0070   Epoch: 14   Global Step: 182910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:02:31,865-Speed 3032.33 samples/sec   Loss 3.8518   LearningRate 0.0070   Epoch: 14   Global Step: 182920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:02:35,286-Speed 2993.91 samples/sec   Loss 3.8528   LearningRate 0.0069   Epoch: 14   Global Step: 182930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:02:38,651-Speed 3044.20 samples/sec   Loss 3.8493   LearningRate 0.0069   Epoch: 14   Global Step: 182940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:02:42,026-Speed 3035.54 samples/sec   Loss 3.7630   LearningRate 0.0069   Epoch: 14   Global Step: 182950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:02:45,401-Speed 3034.79 samples/sec   Loss 3.8947   LearningRate 0.0069   Epoch: 14   Global Step: 182960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:02:48,771-Speed 3039.40 samples/sec   Loss 3.8817   LearningRate 0.0069   Epoch: 14   Global Step: 182970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:02:52,138-Speed 3042.15 samples/sec   Loss 3.7808   LearningRate 0.0069   Epoch: 14   Global Step: 182980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:02:55,498-Speed 3049.48 samples/sec   Loss 3.8904   LearningRate 0.0069   Epoch: 14   Global Step: 182990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:02:58,858-Speed 3048.37 samples/sec   Loss 3.9087   LearningRate 0.0069   Epoch: 14   Global Step: 183000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:03:02,262-Speed 3009.27 samples/sec   Loss 3.8339   LearningRate 0.0069   Epoch: 14   Global Step: 183010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:03:05,643-Speed 3028.90 samples/sec   Loss 3.9383   LearningRate 0.0069   Epoch: 14   Global Step: 183020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:03:09,041-Speed 3015.00 samples/sec   Loss 3.7696   LearningRate 0.0069   Epoch: 14   Global Step: 183030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:03:12,435-Speed 3017.58 samples/sec   Loss 3.8123   LearningRate 0.0069   Epoch: 14   Global Step: 183040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:03:15,791-Speed 3052.74 samples/sec   Loss 3.8339   LearningRate 0.0069   Epoch: 14   Global Step: 183050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:03:19,225-Speed 2982.67 samples/sec   Loss 3.8752   LearningRate 0.0069   Epoch: 14   Global Step: 183060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:03:22,614-Speed 3021.94 samples/sec   Loss 3.8047   LearningRate 0.0069   Epoch: 14   Global Step: 183070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:03:26,049-Speed 2982.08 samples/sec   Loss 4.0022   LearningRate 0.0069   Epoch: 14   Global Step: 183080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:03:29,443-Speed 3018.42 samples/sec   Loss 3.8596   LearningRate 0.0069   Epoch: 14   Global Step: 183090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:03:32,892-Speed 2969.09 samples/sec   Loss 3.9289   LearningRate 0.0069   Epoch: 14   Global Step: 183100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:03:36,315-Speed 2992.37 samples/sec   Loss 3.7572   LearningRate 0.0069   Epoch: 14   Global Step: 183110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:03:39,767-Speed 2967.20 samples/sec   Loss 3.8569   LearningRate 0.0069   Epoch: 14   Global Step: 183120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:03:43,213-Speed 2973.08 samples/sec   Loss 3.8288   LearningRate 0.0069   Epoch: 14   Global Step: 183130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:03:46,646-Speed 2983.10 samples/sec   Loss 3.9191   LearningRate 0.0069   Epoch: 14   Global Step: 183140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:03:50,071-Speed 2991.64 samples/sec   Loss 3.8061   LearningRate 0.0069   Epoch: 14   Global Step: 183150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:03:53,412-Speed 3065.79 samples/sec   Loss 3.8279   LearningRate 0.0069   Epoch: 14   Global Step: 183160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:03:56,738-Speed 3079.64 samples/sec   Loss 3.8244   LearningRate 0.0069   Epoch: 14   Global Step: 183170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:04:00,138-Speed 3012.35 samples/sec   Loss 3.8225   LearningRate 0.0069   Epoch: 14   Global Step: 183180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:04:03,530-Speed 3019.58 samples/sec   Loss 3.8367   LearningRate 0.0069   Epoch: 14   Global Step: 183190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:04:06,854-Speed 3082.26 samples/sec   Loss 3.8799   LearningRate 0.0069   Epoch: 14   Global Step: 183200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:04:10,168-Speed 3091.15 samples/sec   Loss 3.8683   LearningRate 0.0069   Epoch: 14   Global Step: 183210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:04:13,568-Speed 3011.95 samples/sec   Loss 3.8203   LearningRate 0.0069   Epoch: 14   Global Step: 183220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:04:16,942-Speed 3035.57 samples/sec   Loss 3.8145   LearningRate 0.0069   Epoch: 14   Global Step: 183230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:04:20,311-Speed 3040.44 samples/sec   Loss 3.9331   LearningRate 0.0069   Epoch: 14   Global Step: 183240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:04:23,665-Speed 3054.45 samples/sec   Loss 3.8110   LearningRate 0.0069   Epoch: 14   Global Step: 183250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:04:27,057-Speed 3019.90 samples/sec   Loss 3.9250   LearningRate 0.0069   Epoch: 14   Global Step: 183260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:04:30,407-Speed 3057.03 samples/sec   Loss 3.8234   LearningRate 0.0069   Epoch: 14   Global Step: 183270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:04:33,793-Speed 3024.84 samples/sec   Loss 3.9289   LearningRate 0.0069   Epoch: 14   Global Step: 183280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:04:37,162-Speed 3040.74 samples/sec   Loss 3.9569   LearningRate 0.0069   Epoch: 14   Global Step: 183290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:04:40,597-Speed 2982.15 samples/sec   Loss 3.8550   LearningRate 0.0069   Epoch: 14   Global Step: 183300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:04:44,003-Speed 3006.76 samples/sec   Loss 3.8609   LearningRate 0.0069   Epoch: 14   Global Step: 183310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:04:47,428-Speed 2990.18 samples/sec   Loss 3.7757   LearningRate 0.0069   Epoch: 14   Global Step: 183320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:04:50,769-Speed 3066.78 samples/sec   Loss 3.8867   LearningRate 0.0069   Epoch: 14   Global Step: 183330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:04:54,107-Speed 3068.12 samples/sec   Loss 3.9091   LearningRate 0.0069   Epoch: 14   Global Step: 183340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:04:57,498-Speed 3021.37 samples/sec   Loss 3.8727   LearningRate 0.0069   Epoch: 14   Global Step: 183350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:05:00,866-Speed 3040.66 samples/sec   Loss 3.8500   LearningRate 0.0069   Epoch: 14   Global Step: 183360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:05:04,219-Speed 3055.02 samples/sec   Loss 3.9544   LearningRate 0.0069   Epoch: 14   Global Step: 183370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:05:07,554-Speed 3071.24 samples/sec   Loss 3.8142   LearningRate 0.0069   Epoch: 14   Global Step: 183380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:05:10,902-Speed 3061.05 samples/sec   Loss 3.8878   LearningRate 0.0069   Epoch: 14   Global Step: 183390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:05:14,353-Speed 2968.04 samples/sec   Loss 3.8089   LearningRate 0.0069   Epoch: 14   Global Step: 183400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:05:17,772-Speed 2996.48 samples/sec   Loss 3.8219   LearningRate 0.0068   Epoch: 14   Global Step: 183410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:05:21,160-Speed 3023.45 samples/sec   Loss 3.7835   LearningRate 0.0068   Epoch: 14   Global Step: 183420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:05:24,618-Speed 2962.87 samples/sec   Loss 3.8375   LearningRate 0.0068   Epoch: 14   Global Step: 183430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:05:28,055-Speed 2980.22 samples/sec   Loss 3.8373   LearningRate 0.0068   Epoch: 14   Global Step: 183440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:05:31,522-Speed 2954.40 samples/sec   Loss 3.7660   LearningRate 0.0068   Epoch: 14   Global Step: 183450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:05:34,880-Speed 3050.78 samples/sec   Loss 3.8476   LearningRate 0.0068   Epoch: 14   Global Step: 183460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:05:38,249-Speed 3041.07 samples/sec   Loss 3.8225   LearningRate 0.0068   Epoch: 14   Global Step: 183470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:05:41,727-Speed 2944.48 samples/sec   Loss 3.8155   LearningRate 0.0068   Epoch: 14   Global Step: 183480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:05:45,148-Speed 2994.08 samples/sec   Loss 3.8888   LearningRate 0.0068   Epoch: 14   Global Step: 183490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:05:48,608-Speed 2960.48 samples/sec   Loss 3.8476   LearningRate 0.0068   Epoch: 14   Global Step: 183500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:05:51,999-Speed 3020.41 samples/sec   Loss 3.9243   LearningRate 0.0068   Epoch: 14   Global Step: 183510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:05:55,357-Speed 3050.27 samples/sec   Loss 3.8475   LearningRate 0.0068   Epoch: 14   Global Step: 183520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:05:58,709-Speed 3055.97 samples/sec   Loss 3.8490   LearningRate 0.0068   Epoch: 14   Global Step: 183530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:06:02,086-Speed 3033.32 samples/sec   Loss 3.8152   LearningRate 0.0068   Epoch: 14   Global Step: 183540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:06:05,507-Speed 2993.71 samples/sec   Loss 3.8857   LearningRate 0.0068   Epoch: 14   Global Step: 183550   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:06:08,869-Speed 3046.50 samples/sec   Loss 3.8663   LearningRate 0.0068   Epoch: 14   Global Step: 183560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:06:12,262-Speed 3018.83 samples/sec   Loss 3.8472   LearningRate 0.0068   Epoch: 14   Global Step: 183570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:06:15,673-Speed 3003.41 samples/sec   Loss 3.8921   LearningRate 0.0068   Epoch: 14   Global Step: 183580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:06:19,046-Speed 3037.02 samples/sec   Loss 3.9113   LearningRate 0.0068   Epoch: 14   Global Step: 183590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:06:22,464-Speed 2996.76 samples/sec   Loss 3.8538   LearningRate 0.0068   Epoch: 14   Global Step: 183600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:06:25,778-Speed 3090.13 samples/sec   Loss 3.8822   LearningRate 0.0068   Epoch: 14   Global Step: 183610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:06:29,133-Speed 3053.96 samples/sec   Loss 3.8770   LearningRate 0.0068   Epoch: 14   Global Step: 183620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:06:32,532-Speed 3013.01 samples/sec   Loss 3.7951   LearningRate 0.0068   Epoch: 14   Global Step: 183630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:06:35,958-Speed 2990.25 samples/sec   Loss 3.8543   LearningRate 0.0068   Epoch: 14   Global Step: 183640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:06:39,312-Speed 3054.12 samples/sec   Loss 3.8978   LearningRate 0.0068   Epoch: 14   Global Step: 183650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:06:42,727-Speed 2999.84 samples/sec   Loss 3.9243   LearningRate 0.0068   Epoch: 14   Global Step: 183660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:06:46,115-Speed 3022.62 samples/sec   Loss 3.8046   LearningRate 0.0068   Epoch: 14   Global Step: 183670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:06:49,455-Speed 3066.84 samples/sec   Loss 3.8129   LearningRate 0.0068   Epoch: 14   Global Step: 183680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:06:52,851-Speed 3016.70 samples/sec   Loss 3.8317   LearningRate 0.0068   Epoch: 14   Global Step: 183690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:06:56,192-Speed 3065.54 samples/sec   Loss 3.9097   LearningRate 0.0068   Epoch: 14   Global Step: 183700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:06:59,643-Speed 2967.90 samples/sec   Loss 3.8539   LearningRate 0.0068   Epoch: 14   Global Step: 183710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:07:02,989-Speed 3061.28 samples/sec   Loss 3.8970   LearningRate 0.0068   Epoch: 14   Global Step: 183720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:07:06,466-Speed 2946.10 samples/sec   Loss 3.9556   LearningRate 0.0068   Epoch: 14   Global Step: 183730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:07:09,860-Speed 3018.50 samples/sec   Loss 3.8601   LearningRate 0.0068   Epoch: 14   Global Step: 183740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:07:13,254-Speed 3018.11 samples/sec   Loss 3.8429   LearningRate 0.0068   Epoch: 14   Global Step: 183750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:07:16,613-Speed 3048.62 samples/sec   Loss 3.7719   LearningRate 0.0068   Epoch: 14   Global Step: 183760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:07:19,942-Speed 3077.47 samples/sec   Loss 3.8233   LearningRate 0.0068   Epoch: 14   Global Step: 183770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:07:23,343-Speed 3011.92 samples/sec   Loss 4.0004   LearningRate 0.0068   Epoch: 14   Global Step: 183780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:07:26,708-Speed 3044.27 samples/sec   Loss 3.8132   LearningRate 0.0068   Epoch: 14   Global Step: 183790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:07:30,156-Speed 2970.29 samples/sec   Loss 3.8002   LearningRate 0.0068   Epoch: 14   Global Step: 183800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:07:33,572-Speed 2998.85 samples/sec   Loss 3.8128   LearningRate 0.0068   Epoch: 14   Global Step: 183810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:07:37,070-Speed 2928.51 samples/sec   Loss 3.8811   LearningRate 0.0068   Epoch: 14   Global Step: 183820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:07:40,484-Speed 3000.20 samples/sec   Loss 3.8911   LearningRate 0.0068   Epoch: 14   Global Step: 183830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:07:43,853-Speed 3040.54 samples/sec   Loss 3.7227   LearningRate 0.0068   Epoch: 14   Global Step: 183840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:07:47,279-Speed 2989.10 samples/sec   Loss 3.8246   LearningRate 0.0068   Epoch: 14   Global Step: 183850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:07:50,685-Speed 3007.86 samples/sec   Loss 3.9585   LearningRate 0.0068   Epoch: 14   Global Step: 183860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:07:54,031-Speed 3060.73 samples/sec   Loss 3.8618   LearningRate 0.0068   Epoch: 14   Global Step: 183870   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:07:57,369-Speed 3069.47 samples/sec   Loss 3.8268   LearningRate 0.0067   Epoch: 14   Global Step: 183880   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:08:00,778-Speed 3004.48 samples/sec   Loss 3.7887   LearningRate 0.0067   Epoch: 14   Global Step: 183890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:08:04,111-Speed 3072.72 samples/sec   Loss 3.7488   LearningRate 0.0067   Epoch: 14   Global Step: 183900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:08:07,456-Speed 3062.92 samples/sec   Loss 3.8327   LearningRate 0.0067   Epoch: 14   Global Step: 183910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:08:10,888-Speed 2984.03 samples/sec   Loss 3.8834   LearningRate 0.0067   Epoch: 14   Global Step: 183920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:08:14,292-Speed 3009.03 samples/sec   Loss 3.9003   LearningRate 0.0067   Epoch: 14   Global Step: 183930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:08:17,687-Speed 3017.09 samples/sec   Loss 3.8310   LearningRate 0.0067   Epoch: 14   Global Step: 183940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:08:21,068-Speed 3030.19 samples/sec   Loss 3.7694   LearningRate 0.0067   Epoch: 14   Global Step: 183950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:08:24,434-Speed 3042.87 samples/sec   Loss 3.7950   LearningRate 0.0067   Epoch: 14   Global Step: 183960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:08:27,830-Speed 3015.78 samples/sec   Loss 3.8106   LearningRate 0.0067   Epoch: 14   Global Step: 183970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:08:31,286-Speed 2964.77 samples/sec   Loss 3.7842   LearningRate 0.0067   Epoch: 14   Global Step: 183980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:08:34,661-Speed 3035.40 samples/sec   Loss 3.8745   LearningRate 0.0067   Epoch: 14   Global Step: 183990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:08:38,023-Speed 3047.00 samples/sec   Loss 3.8465   LearningRate 0.0067   Epoch: 14   Global Step: 184000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:08:41,507-Speed 2939.79 samples/sec   Loss 3.7814   LearningRate 0.0067   Epoch: 14   Global Step: 184010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:08:44,838-Speed 3075.27 samples/sec   Loss 3.8607   LearningRate 0.0067   Epoch: 14   Global Step: 184020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:08:48,230-Speed 3019.01 samples/sec   Loss 3.8061   LearningRate 0.0067   Epoch: 14   Global Step: 184030   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:08:51,558-Speed 3078.55 samples/sec   Loss 3.9081   LearningRate 0.0067   Epoch: 14   Global Step: 184040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:08:54,973-Speed 2998.93 samples/sec   Loss 3.9057   LearningRate 0.0067   Epoch: 14   Global Step: 184050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:08:58,307-Speed 3072.23 samples/sec   Loss 3.8652   LearningRate 0.0067   Epoch: 14   Global Step: 184060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:01,717-Speed 3003.51 samples/sec   Loss 3.7094   LearningRate 0.0067   Epoch: 14   Global Step: 184070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:05,060-Speed 3064.27 samples/sec   Loss 3.7867   LearningRate 0.0067   Epoch: 14   Global Step: 184080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:08,431-Speed 3038.98 samples/sec   Loss 3.7705   LearningRate 0.0067   Epoch: 14   Global Step: 184090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:11,795-Speed 3045.62 samples/sec   Loss 3.8635   LearningRate 0.0067   Epoch: 14   Global Step: 184100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:15,101-Speed 3098.43 samples/sec   Loss 3.8815   LearningRate 0.0067   Epoch: 14   Global Step: 184110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:18,482-Speed 3029.85 samples/sec   Loss 3.8030   LearningRate 0.0067   Epoch: 14   Global Step: 184120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:09:21,784-Speed 3101.40 samples/sec   Loss 3.8717   LearningRate 0.0067   Epoch: 14   Global Step: 184130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:09:25,121-Speed 3069.84 samples/sec   Loss 3.8563   LearningRate 0.0067   Epoch: 14   Global Step: 184140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:09:28,511-Speed 3021.33 samples/sec   Loss 3.9493   LearningRate 0.0067   Epoch: 14   Global Step: 184150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:31,966-Speed 2964.42 samples/sec   Loss 3.7873   LearningRate 0.0067   Epoch: 14   Global Step: 184160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:35,302-Speed 3070.50 samples/sec   Loss 3.8443   LearningRate 0.0067   Epoch: 14   Global Step: 184170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:38,714-Speed 3002.95 samples/sec   Loss 3.9012   LearningRate 0.0067   Epoch: 14   Global Step: 184180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:42,090-Speed 3033.63 samples/sec   Loss 3.7897   LearningRate 0.0067   Epoch: 14   Global Step: 184190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:45,431-Speed 3065.70 samples/sec   Loss 3.8246   LearningRate 0.0067   Epoch: 14   Global Step: 184200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:48,760-Speed 3077.21 samples/sec   Loss 3.9548   LearningRate 0.0067   Epoch: 14   Global Step: 184210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:52,102-Speed 3064.97 samples/sec   Loss 3.8575   LearningRate 0.0067   Epoch: 14   Global Step: 184220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:55,554-Speed 2966.71 samples/sec   Loss 3.8482   LearningRate 0.0067   Epoch: 14   Global Step: 184230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:09:58,901-Speed 3060.86 samples/sec   Loss 3.8256   LearningRate 0.0067   Epoch: 14   Global Step: 184240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:10:02,343-Speed 2975.35 samples/sec   Loss 3.8628   LearningRate 0.0067   Epoch: 14   Global Step: 184250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:10:05,728-Speed 3025.96 samples/sec   Loss 3.8810   LearningRate 0.0067   Epoch: 14   Global Step: 184260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:10:09,168-Speed 2977.51 samples/sec   Loss 3.7535   LearningRate 0.0067   Epoch: 14   Global Step: 184270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:10:12,501-Speed 3074.01 samples/sec   Loss 3.8856   LearningRate 0.0067   Epoch: 14   Global Step: 184280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:10:15,829-Speed 3076.95 samples/sec   Loss 3.8003   LearningRate 0.0067   Epoch: 14   Global Step: 184290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:10:19,201-Speed 3037.74 samples/sec   Loss 3.7690   LearningRate 0.0067   Epoch: 14   Global Step: 184300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:10:22,537-Speed 3070.62 samples/sec   Loss 3.8635   LearningRate 0.0067   Epoch: 14   Global Step: 184310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:10:25,892-Speed 3052.81 samples/sec   Loss 3.9096   LearningRate 0.0067   Epoch: 14   Global Step: 184320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:10:29,327-Speed 2981.70 samples/sec   Loss 3.7378   LearningRate 0.0067   Epoch: 14   Global Step: 184330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:10:32,684-Speed 3051.48 samples/sec   Loss 3.8161   LearningRate 0.0067   Epoch: 14   Global Step: 184340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:10:35,996-Speed 3093.12 samples/sec   Loss 3.7862   LearningRate 0.0067   Epoch: 14   Global Step: 184350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:10:39,340-Speed 3062.45 samples/sec   Loss 3.8358   LearningRate 0.0066   Epoch: 14   Global Step: 184360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:10:42,709-Speed 3040.88 samples/sec   Loss 3.8958   LearningRate 0.0066   Epoch: 14   Global Step: 184370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:10:46,118-Speed 3004.04 samples/sec   Loss 3.7081   LearningRate 0.0066   Epoch: 14   Global Step: 184380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:10:49,485-Speed 3042.42 samples/sec   Loss 3.8208   LearningRate 0.0066   Epoch: 14   Global Step: 184390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:10:52,875-Speed 3021.27 samples/sec   Loss 3.7641   LearningRate 0.0066   Epoch: 14   Global Step: 184400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:10:56,340-Speed 2956.64 samples/sec   Loss 3.8833   LearningRate 0.0066   Epoch: 14   Global Step: 184410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:10:59,713-Speed 3036.68 samples/sec   Loss 3.9187   LearningRate 0.0066   Epoch: 14   Global Step: 184420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:11:03,160-Speed 2970.99 samples/sec   Loss 3.8619   LearningRate 0.0066   Epoch: 14   Global Step: 184430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:11:06,594-Speed 2982.87 samples/sec   Loss 3.8938   LearningRate 0.0066   Epoch: 14   Global Step: 184440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:11:10,031-Speed 2979.81 samples/sec   Loss 3.8015   LearningRate 0.0066   Epoch: 14   Global Step: 184450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:11:13,443-Speed 3002.44 samples/sec   Loss 3.8155   LearningRate 0.0066   Epoch: 14   Global Step: 184460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:11:16,886-Speed 2975.16 samples/sec   Loss 3.8230   LearningRate 0.0066   Epoch: 14   Global Step: 184470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:11:20,303-Speed 2997.80 samples/sec   Loss 3.7380   LearningRate 0.0066   Epoch: 14   Global Step: 184480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:11:23,735-Speed 2984.46 samples/sec   Loss 3.6967   LearningRate 0.0066   Epoch: 14   Global Step: 184490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:11:27,206-Speed 2950.43 samples/sec   Loss 3.7862   LearningRate 0.0066   Epoch: 14   Global Step: 184500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:11:30,641-Speed 2982.59 samples/sec   Loss 3.9237   LearningRate 0.0066   Epoch: 14   Global Step: 184510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:11:34,033-Speed 3019.33 samples/sec   Loss 3.7552   LearningRate 0.0066   Epoch: 14   Global Step: 184520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:11:37,535-Speed 2924.79 samples/sec   Loss 3.8714   LearningRate 0.0066   Epoch: 14   Global Step: 184530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:11:40,920-Speed 3026.08 samples/sec   Loss 3.8426   LearningRate 0.0066   Epoch: 14   Global Step: 184540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:11:44,335-Speed 2999.51 samples/sec   Loss 3.9242   LearningRate 0.0066   Epoch: 14   Global Step: 184550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:11:47,744-Speed 3004.34 samples/sec   Loss 3.8193   LearningRate 0.0066   Epoch: 14   Global Step: 184560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:11:51,173-Speed 2987.89 samples/sec   Loss 3.8062   LearningRate 0.0066   Epoch: 14   Global Step: 184570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:11:54,547-Speed 3035.55 samples/sec   Loss 3.8279   LearningRate 0.0066   Epoch: 14   Global Step: 184580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:11:57,859-Speed 3091.91 samples/sec   Loss 3.8708   LearningRate 0.0066   Epoch: 14   Global Step: 184590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:12:01,205-Speed 3061.27 samples/sec   Loss 3.8665   LearningRate 0.0066   Epoch: 14   Global Step: 184600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:12:04,623-Speed 2997.21 samples/sec   Loss 3.8098   LearningRate 0.0066   Epoch: 14   Global Step: 184610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:12:07,965-Speed 3064.16 samples/sec   Loss 3.7995   LearningRate 0.0066   Epoch: 14   Global Step: 184620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:12:11,335-Speed 3039.64 samples/sec   Loss 3.8185   LearningRate 0.0066   Epoch: 14   Global Step: 184630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:12:14,752-Speed 2997.91 samples/sec   Loss 3.7989   LearningRate 0.0066   Epoch: 14   Global Step: 184640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:12:18,108-Speed 3051.98 samples/sec   Loss 3.9229   LearningRate 0.0066   Epoch: 14   Global Step: 184650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:12:21,574-Speed 2955.61 samples/sec   Loss 3.8250   LearningRate 0.0066   Epoch: 14   Global Step: 184660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:12:25,004-Speed 2985.50 samples/sec   Loss 3.8260   LearningRate 0.0066   Epoch: 14   Global Step: 184670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:12:28,355-Speed 3057.13 samples/sec   Loss 3.7272   LearningRate 0.0066   Epoch: 14   Global Step: 184680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:12:31,723-Speed 3040.97 samples/sec   Loss 3.7663   LearningRate 0.0066   Epoch: 14   Global Step: 184690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:12:35,071-Speed 3059.23 samples/sec   Loss 3.9349   LearningRate 0.0066   Epoch: 14   Global Step: 184700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:12:38,511-Speed 2977.02 samples/sec   Loss 3.8220   LearningRate 0.0066   Epoch: 14   Global Step: 184710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:12:41,833-Speed 3083.96 samples/sec   Loss 3.8359   LearningRate 0.0066   Epoch: 14   Global Step: 184720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:12:45,268-Speed 2982.04 samples/sec   Loss 3.8491   LearningRate 0.0066   Epoch: 14   Global Step: 184730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:12:48,659-Speed 3020.15 samples/sec   Loss 3.8414   LearningRate 0.0066   Epoch: 14   Global Step: 184740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:12:52,027-Speed 3041.81 samples/sec   Loss 3.8205   LearningRate 0.0066   Epoch: 14   Global Step: 184750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:12:55,440-Speed 3001.23 samples/sec   Loss 3.8674   LearningRate 0.0066   Epoch: 14   Global Step: 184760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:12:58,872-Speed 2984.15 samples/sec   Loss 3.7317   LearningRate 0.0066   Epoch: 14   Global Step: 184770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:13:02,202-Speed 3076.06 samples/sec   Loss 3.7683   LearningRate 0.0066   Epoch: 14   Global Step: 184780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:13:05,647-Speed 2973.72 samples/sec   Loss 3.8672   LearningRate 0.0066   Epoch: 14   Global Step: 184790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:13:09,118-Speed 2950.94 samples/sec   Loss 3.9063   LearningRate 0.0066   Epoch: 14   Global Step: 184800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:13:12,468-Speed 3057.28 samples/sec   Loss 3.8697   LearningRate 0.0066   Epoch: 14   Global Step: 184810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:13:15,926-Speed 2961.75 samples/sec   Loss 3.8932   LearningRate 0.0066   Epoch: 14   Global Step: 184820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:13:19,371-Speed 2973.61 samples/sec   Loss 3.8572   LearningRate 0.0066   Epoch: 14   Global Step: 184830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:13:22,754-Speed 3027.62 samples/sec   Loss 3.7746   LearningRate 0.0066   Epoch: 14   Global Step: 184840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:13:26,096-Speed 3064.55 samples/sec   Loss 3.8534   LearningRate 0.0065   Epoch: 14   Global Step: 184850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:13:29,469-Speed 3036.80 samples/sec   Loss 3.8459   LearningRate 0.0065   Epoch: 14   Global Step: 184860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:13:32,843-Speed 3036.32 samples/sec   Loss 3.8446   LearningRate 0.0065   Epoch: 14   Global Step: 184870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:13:36,206-Speed 3045.63 samples/sec   Loss 3.7947   LearningRate 0.0065   Epoch: 14   Global Step: 184880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:13:39,515-Speed 3094.51 samples/sec   Loss 3.8968   LearningRate 0.0065   Epoch: 14   Global Step: 184890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:13:42,837-Speed 3083.93 samples/sec   Loss 3.6810   LearningRate 0.0065   Epoch: 14   Global Step: 184900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:13:46,163-Speed 3079.21 samples/sec   Loss 3.8382   LearningRate 0.0065   Epoch: 14   Global Step: 184910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:13:49,499-Speed 3071.33 samples/sec   Loss 3.8619   LearningRate 0.0065   Epoch: 14   Global Step: 184920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:13:52,850-Speed 3056.69 samples/sec   Loss 3.8093   LearningRate 0.0065   Epoch: 14   Global Step: 184930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:13:56,185-Speed 3071.17 samples/sec   Loss 3.7121   LearningRate 0.0065   Epoch: 14   Global Step: 184940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:13:59,536-Speed 3056.95 samples/sec   Loss 3.9184   LearningRate 0.0065   Epoch: 14   Global Step: 184950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:14:02,913-Speed 3032.81 samples/sec   Loss 3.7676   LearningRate 0.0065   Epoch: 14   Global Step: 184960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:14:06,368-Speed 2964.42 samples/sec   Loss 3.8023   LearningRate 0.0065   Epoch: 14   Global Step: 184970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:14:09,820-Speed 2967.51 samples/sec   Loss 3.8878   LearningRate 0.0065   Epoch: 14   Global Step: 184980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:14:13,183-Speed 3045.29 samples/sec   Loss 3.7842   LearningRate 0.0065   Epoch: 14   Global Step: 184990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:14:16,601-Speed 2997.27 samples/sec   Loss 3.8437   LearningRate 0.0065   Epoch: 14   Global Step: 185000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:14:20,021-Speed 2995.01 samples/sec   Loss 3.7923   LearningRate 0.0065   Epoch: 14   Global Step: 185010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:14:23,398-Speed 3032.53 samples/sec   Loss 3.7333   LearningRate 0.0065   Epoch: 14   Global Step: 185020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:14:26,814-Speed 3001.55 samples/sec   Loss 3.9127   LearningRate 0.0065   Epoch: 14   Global Step: 185030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:14:30,278-Speed 2956.69 samples/sec   Loss 3.8492   LearningRate 0.0065   Epoch: 14   Global Step: 185040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:14:33,653-Speed 3035.23 samples/sec   Loss 3.8547   LearningRate 0.0065   Epoch: 14   Global Step: 185050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:14:37,062-Speed 3004.55 samples/sec   Loss 3.9550   LearningRate 0.0065   Epoch: 14   Global Step: 185060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:14:40,425-Speed 3045.89 samples/sec   Loss 3.7876   LearningRate 0.0065   Epoch: 14   Global Step: 185070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:14:43,785-Speed 3048.95 samples/sec   Loss 3.8467   LearningRate 0.0065   Epoch: 14   Global Step: 185080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:14:47,128-Speed 3063.79 samples/sec   Loss 3.8328   LearningRate 0.0065   Epoch: 14   Global Step: 185090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:14:50,630-Speed 2924.18 samples/sec   Loss 3.8232   LearningRate 0.0065   Epoch: 14   Global Step: 185100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:14:53,984-Speed 3054.14 samples/sec   Loss 3.7930   LearningRate 0.0065   Epoch: 14   Global Step: 185110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:14:57,389-Speed 3008.21 samples/sec   Loss 3.7693   LearningRate 0.0065   Epoch: 14   Global Step: 185120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:15:00,770-Speed 3030.08 samples/sec   Loss 3.8757   LearningRate 0.0065   Epoch: 14   Global Step: 185130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:15:04,187-Speed 2997.56 samples/sec   Loss 3.7304   LearningRate 0.0065   Epoch: 14   Global Step: 185140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:15:07,592-Speed 3007.71 samples/sec   Loss 3.9081   LearningRate 0.0065   Epoch: 14   Global Step: 185150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:15:10,946-Speed 3054.54 samples/sec   Loss 3.7363   LearningRate 0.0065   Epoch: 14   Global Step: 185160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:15:14,293-Speed 3060.26 samples/sec   Loss 3.8644   LearningRate 0.0065   Epoch: 14   Global Step: 185170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:15:17,658-Speed 3043.46 samples/sec   Loss 3.8473   LearningRate 0.0065   Epoch: 14   Global Step: 185180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:15:21,067-Speed 3005.01 samples/sec   Loss 3.7910   LearningRate 0.0065   Epoch: 14   Global Step: 185190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:15:24,449-Speed 3028.15 samples/sec   Loss 3.7944   LearningRate 0.0065   Epoch: 14   Global Step: 185200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:15:27,819-Speed 3039.68 samples/sec   Loss 3.8958   LearningRate 0.0065   Epoch: 14   Global Step: 185210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:15:31,152-Speed 3073.94 samples/sec   Loss 3.8623   LearningRate 0.0065   Epoch: 14   Global Step: 185220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:15:34,637-Speed 2939.21 samples/sec   Loss 3.8380   LearningRate 0.0065   Epoch: 14   Global Step: 185230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:15:38,039-Speed 3010.63 samples/sec   Loss 3.8376   LearningRate 0.0065   Epoch: 14   Global Step: 185240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:15:41,363-Speed 3081.10 samples/sec   Loss 3.7308   LearningRate 0.0065   Epoch: 14   Global Step: 185250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:15:44,747-Speed 3026.77 samples/sec   Loss 3.8801   LearningRate 0.0065   Epoch: 14   Global Step: 185260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:15:48,233-Speed 2938.39 samples/sec   Loss 3.8090   LearningRate 0.0065   Epoch: 14   Global Step: 185270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:15:51,684-Speed 2968.52 samples/sec   Loss 3.8140   LearningRate 0.0065   Epoch: 14   Global Step: 185280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:15:55,062-Speed 3031.69 samples/sec   Loss 3.7721   LearningRate 0.0065   Epoch: 14   Global Step: 185290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:15:58,470-Speed 3006.10 samples/sec   Loss 3.8131   LearningRate 0.0065   Epoch: 14   Global Step: 185300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:16:01,873-Speed 3009.41 samples/sec   Loss 3.7855   LearningRate 0.0065   Epoch: 14   Global Step: 185310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:16:05,216-Speed 3063.69 samples/sec   Loss 3.8685   LearningRate 0.0065   Epoch: 14   Global Step: 185320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:16:08,641-Speed 2991.41 samples/sec   Loss 3.8219   LearningRate 0.0064   Epoch: 14   Global Step: 185330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:16:12,067-Speed 2989.56 samples/sec   Loss 3.8493   LearningRate 0.0064   Epoch: 14   Global Step: 185340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:16:15,460-Speed 3018.61 samples/sec   Loss 3.7890   LearningRate 0.0064   Epoch: 14   Global Step: 185350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:16:18,843-Speed 3027.51 samples/sec   Loss 3.7664   LearningRate 0.0064   Epoch: 14   Global Step: 185360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:16:22,201-Speed 3050.56 samples/sec   Loss 3.7543   LearningRate 0.0064   Epoch: 14   Global Step: 185370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:16:25,594-Speed 3019.18 samples/sec   Loss 3.7822   LearningRate 0.0064   Epoch: 14   Global Step: 185380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:16:29,069-Speed 2947.70 samples/sec   Loss 3.7497   LearningRate 0.0064   Epoch: 14   Global Step: 185390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:16:32,509-Speed 2977.14 samples/sec   Loss 3.8540   LearningRate 0.0064   Epoch: 14   Global Step: 185400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:16:35,884-Speed 3035.20 samples/sec   Loss 3.8020   LearningRate 0.0064   Epoch: 14   Global Step: 185410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:16:39,284-Speed 3012.48 samples/sec   Loss 3.7536   LearningRate 0.0064   Epoch: 14   Global Step: 185420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:16:42,740-Speed 2964.59 samples/sec   Loss 3.8021   LearningRate 0.0064   Epoch: 14   Global Step: 185430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:16:46,209-Speed 2952.55 samples/sec   Loss 3.8764   LearningRate 0.0064   Epoch: 14   Global Step: 185440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:16:49,593-Speed 3026.98 samples/sec   Loss 3.8061   LearningRate 0.0064   Epoch: 14   Global Step: 185450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:16:53,112-Speed 2910.40 samples/sec   Loss 3.8820   LearningRate 0.0064   Epoch: 14   Global Step: 185460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:16:56,558-Speed 2972.46 samples/sec   Loss 3.8013   LearningRate 0.0064   Epoch: 14   Global Step: 185470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:16:59,928-Speed 3039.93 samples/sec   Loss 3.7439   LearningRate 0.0064   Epoch: 14   Global Step: 185480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:17:03,319-Speed 3020.80 samples/sec   Loss 3.8560   LearningRate 0.0064   Epoch: 14   Global Step: 185490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:17:06,795-Speed 2946.71 samples/sec   Loss 3.8864   LearningRate 0.0064   Epoch: 14   Global Step: 185500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:17:10,166-Speed 3038.36 samples/sec   Loss 3.7766   LearningRate 0.0064   Epoch: 14   Global Step: 185510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:17:13,621-Speed 2964.87 samples/sec   Loss 3.7609   LearningRate 0.0064   Epoch: 14   Global Step: 185520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:17:17,063-Speed 2975.81 samples/sec   Loss 3.7885   LearningRate 0.0064   Epoch: 14   Global Step: 185530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:17:20,418-Speed 3052.44 samples/sec   Loss 3.8438   LearningRate 0.0064   Epoch: 14   Global Step: 185540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:17:23,744-Speed 3080.20 samples/sec   Loss 3.7666   LearningRate 0.0064   Epoch: 14   Global Step: 185550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:17:27,119-Speed 3034.57 samples/sec   Loss 3.7763   LearningRate 0.0064   Epoch: 14   Global Step: 185560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:17:30,472-Speed 3054.78 samples/sec   Loss 3.8362   LearningRate 0.0064   Epoch: 14   Global Step: 185570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:17:33,878-Speed 3008.17 samples/sec   Loss 3.7979   LearningRate 0.0064   Epoch: 14   Global Step: 185580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:17:37,209-Speed 3074.93 samples/sec   Loss 3.7840   LearningRate 0.0064   Epoch: 14   Global Step: 185590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:17:40,521-Speed 3092.92 samples/sec   Loss 3.8219   LearningRate 0.0064   Epoch: 14   Global Step: 185600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:17:43,897-Speed 3033.45 samples/sec   Loss 3.8380   LearningRate 0.0064   Epoch: 14   Global Step: 185610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:17:47,244-Speed 3060.54 samples/sec   Loss 3.8196   LearningRate 0.0064   Epoch: 14   Global Step: 185620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:17:50,694-Speed 2968.61 samples/sec   Loss 3.8784   LearningRate 0.0064   Epoch: 14   Global Step: 185630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:17:54,022-Speed 3077.83 samples/sec   Loss 3.6924   LearningRate 0.0064   Epoch: 14   Global Step: 185640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:17:57,425-Speed 3010.03 samples/sec   Loss 3.9275   LearningRate 0.0064   Epoch: 14   Global Step: 185650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:18:00,774-Speed 3058.22 samples/sec   Loss 3.8386   LearningRate 0.0064   Epoch: 14   Global Step: 185660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:18:04,133-Speed 3050.07 samples/sec   Loss 3.7484   LearningRate 0.0064   Epoch: 14   Global Step: 185670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:18:07,500-Speed 3042.46 samples/sec   Loss 3.8456   LearningRate 0.0064   Epoch: 14   Global Step: 185680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:18:10,867-Speed 3041.89 samples/sec   Loss 3.7974   LearningRate 0.0064   Epoch: 14   Global Step: 185690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:18:14,255-Speed 3022.83 samples/sec   Loss 3.7002   LearningRate 0.0064   Epoch: 14   Global Step: 185700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:18:17,602-Speed 3060.15 samples/sec   Loss 3.8840   LearningRate 0.0064   Epoch: 14   Global Step: 185710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:18:20,976-Speed 3036.46 samples/sec   Loss 3.8496   LearningRate 0.0064   Epoch: 14   Global Step: 185720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:18:24,413-Speed 2980.49 samples/sec   Loss 3.7278   LearningRate 0.0064   Epoch: 14   Global Step: 185730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:18:27,824-Speed 3002.76 samples/sec   Loss 3.7540   LearningRate 0.0064   Epoch: 14   Global Step: 185740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:18:31,181-Speed 3051.25 samples/sec   Loss 3.8216   LearningRate 0.0064   Epoch: 14   Global Step: 185750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:18:34,612-Speed 2985.28 samples/sec   Loss 3.8347   LearningRate 0.0064   Epoch: 14   Global Step: 185760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:18:38,087-Speed 2947.10 samples/sec   Loss 3.7753   LearningRate 0.0064   Epoch: 14   Global Step: 185770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:18:41,442-Speed 3053.68 samples/sec   Loss 3.7469   LearningRate 0.0064   Epoch: 14   Global Step: 185780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:18:44,761-Speed 3085.26 samples/sec   Loss 3.7676   LearningRate 0.0064   Epoch: 14   Global Step: 185790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:18:48,160-Speed 3013.75 samples/sec   Loss 3.7504   LearningRate 0.0064   Epoch: 14   Global Step: 185800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:18:51,491-Speed 3075.22 samples/sec   Loss 3.7840   LearningRate 0.0064   Epoch: 14   Global Step: 185810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:18:54,999-Speed 2919.80 samples/sec   Loss 3.8347   LearningRate 0.0064   Epoch: 14   Global Step: 185820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:18:58,373-Speed 3036.08 samples/sec   Loss 3.8198   LearningRate 0.0063   Epoch: 14   Global Step: 185830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:19:01,740-Speed 3041.81 samples/sec   Loss 3.8573   LearningRate 0.0063   Epoch: 14   Global Step: 185840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:19:05,157-Speed 2997.92 samples/sec   Loss 3.8260   LearningRate 0.0063   Epoch: 14   Global Step: 185850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:19:08,498-Speed 3065.72 samples/sec   Loss 3.8453   LearningRate 0.0063   Epoch: 14   Global Step: 185860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:19:11,865-Speed 3042.14 samples/sec   Loss 3.8632   LearningRate 0.0063   Epoch: 14   Global Step: 185870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:19:15,191-Speed 3079.90 samples/sec   Loss 3.6983   LearningRate 0.0063   Epoch: 14   Global Step: 185880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:19:18,551-Speed 3047.89 samples/sec   Loss 3.7956   LearningRate 0.0063   Epoch: 14   Global Step: 185890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:19:22,048-Speed 2929.09 samples/sec   Loss 3.8691   LearningRate 0.0063   Epoch: 14   Global Step: 185900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:19:25,449-Speed 3011.96 samples/sec   Loss 3.7536   LearningRate 0.0063   Epoch: 14   Global Step: 185910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:19:28,796-Speed 3060.39 samples/sec   Loss 3.7292   LearningRate 0.0063   Epoch: 14   Global Step: 185920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:19:32,204-Speed 3006.03 samples/sec   Loss 3.6972   LearningRate 0.0063   Epoch: 14   Global Step: 185930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:19:35,537-Speed 3072.61 samples/sec   Loss 3.7933   LearningRate 0.0063   Epoch: 14   Global Step: 185940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:19:38,851-Speed 3090.96 samples/sec   Loss 3.7670   LearningRate 0.0063   Epoch: 14   Global Step: 185950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:19:42,236-Speed 3025.54 samples/sec   Loss 3.7737   LearningRate 0.0063   Epoch: 14   Global Step: 185960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:19:45,576-Speed 3066.75 samples/sec   Loss 3.7318   LearningRate 0.0063   Epoch: 14   Global Step: 185970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:19:48,924-Speed 3059.81 samples/sec   Loss 3.7998   LearningRate 0.0063   Epoch: 14   Global Step: 185980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:19:52,300-Speed 3033.76 samples/sec   Loss 3.7594   LearningRate 0.0063   Epoch: 14   Global Step: 185990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:19:55,663-Speed 3045.30 samples/sec   Loss 3.7579   LearningRate 0.0063   Epoch: 14   Global Step: 186000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:19:59,040-Speed 3033.63 samples/sec   Loss 3.8663   LearningRate 0.0063   Epoch: 14   Global Step: 186010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:20:02,414-Speed 3035.55 samples/sec   Loss 3.8105   LearningRate 0.0063   Epoch: 14   Global Step: 186020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:20:05,785-Speed 3038.83 samples/sec   Loss 3.8887   LearningRate 0.0063   Epoch: 14   Global Step: 186030   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:20:09,124-Speed 3067.97 samples/sec   Loss 3.7787   LearningRate 0.0063   Epoch: 14   Global Step: 186040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:20:12,470-Speed 3060.59 samples/sec   Loss 3.7846   LearningRate 0.0063   Epoch: 14   Global Step: 186050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:20:15,827-Speed 3051.88 samples/sec   Loss 3.7933   LearningRate 0.0063   Epoch: 14   Global Step: 186060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:20:19,252-Speed 2990.32 samples/sec   Loss 3.7250   LearningRate 0.0063   Epoch: 14   Global Step: 186070   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:20:22,675-Speed 2992.24 samples/sec   Loss 3.7872   LearningRate 0.0063   Epoch: 14   Global Step: 186080   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:20:26,069-Speed 3018.26 samples/sec   Loss 3.7371   LearningRate 0.0063   Epoch: 14   Global Step: 186090   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:20:29,536-Speed 2954.36 samples/sec   Loss 3.8532   LearningRate 0.0063   Epoch: 14   Global Step: 186100   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:20:32,905-Speed 3040.68 samples/sec   Loss 3.7403   LearningRate 0.0063   Epoch: 14   Global Step: 186110   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:20:36,283-Speed 3032.32 samples/sec   Loss 3.6690   LearningRate 0.0063   Epoch: 14   Global Step: 186120   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:20:39,756-Speed 2948.82 samples/sec   Loss 3.8180   LearningRate 0.0063   Epoch: 14   Global Step: 186130   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:20:43,166-Speed 3003.76 samples/sec   Loss 3.8133   LearningRate 0.0063   Epoch: 14   Global Step: 186140   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:20:46,501-Speed 3071.89 samples/sec   Loss 3.7874   LearningRate 0.0063   Epoch: 14   Global Step: 186150   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:20:49,979-Speed 2945.27 samples/sec   Loss 3.7841   LearningRate 0.0063   Epoch: 14   Global Step: 186160   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:20:53,372-Speed 3018.86 samples/sec   Loss 3.8024   LearningRate 0.0063   Epoch: 14   Global Step: 186170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:20:56,734-Speed 3046.37 samples/sec   Loss 3.7269   LearningRate 0.0063   Epoch: 14   Global Step: 186180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:21:00,140-Speed 3007.52 samples/sec   Loss 3.7098   LearningRate 0.0063   Epoch: 14   Global Step: 186190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:21:03,476-Speed 3069.82 samples/sec   Loss 3.8216   LearningRate 0.0063   Epoch: 14   Global Step: 186200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:21:06,843-Speed 3042.69 samples/sec   Loss 3.7771   LearningRate 0.0063   Epoch: 14   Global Step: 186210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:21:10,201-Speed 3050.32 samples/sec   Loss 3.7749   LearningRate 0.0063   Epoch: 14   Global Step: 186220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:21:13,567-Speed 3043.10 samples/sec   Loss 3.7779   LearningRate 0.0063   Epoch: 14   Global Step: 186230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:21:16,976-Speed 3003.81 samples/sec   Loss 3.7975   LearningRate 0.0063   Epoch: 14   Global Step: 186240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:21:20,367-Speed 3021.18 samples/sec   Loss 3.7063   LearningRate 0.0063   Epoch: 14   Global Step: 186250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:21:23,733-Speed 3043.06 samples/sec   Loss 3.7437   LearningRate 0.0063   Epoch: 14   Global Step: 186260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:21:27,121-Speed 3022.72 samples/sec   Loss 3.8310   LearningRate 0.0063   Epoch: 14   Global Step: 186270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:21:30,457-Speed 3070.44 samples/sec   Loss 3.6376   LearningRate 0.0063   Epoch: 14   Global Step: 186280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:21:33,881-Speed 2991.28 samples/sec   Loss 3.8506   LearningRate 0.0063   Epoch: 14   Global Step: 186290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:21:37,291-Speed 3004.27 samples/sec   Loss 3.7802   LearningRate 0.0063   Epoch: 14   Global Step: 186300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:21:41,106-Speed 2684.75 samples/sec   Loss 3.8073   LearningRate 0.0063   Epoch: 14   Global Step: 186310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:22:14,102-Speed 310.35 samples/sec   Loss 3.2311   LearningRate 0.0062   Epoch: 15   Global Step: 186320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:22:17,534-Speed 2984.77 samples/sec   Loss 2.5887   LearningRate 0.0062   Epoch: 15   Global Step: 186330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:22:21,220-Speed 2779.21 samples/sec   Loss 2.4440   LearningRate 0.0062   Epoch: 15   Global Step: 186340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:22:24,628-Speed 3005.23 samples/sec   Loss 2.6139   LearningRate 0.0062   Epoch: 15   Global Step: 186350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:22:27,986-Speed 3051.19 samples/sec   Loss 2.5951   LearningRate 0.0062   Epoch: 15   Global Step: 186360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:22:31,403-Speed 2997.06 samples/sec   Loss 2.5864   LearningRate 0.0062   Epoch: 15   Global Step: 186370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:22:34,779-Speed 3033.77 samples/sec   Loss 2.4867   LearningRate 0.0062   Epoch: 15   Global Step: 186380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:22:38,128-Speed 3059.14 samples/sec   Loss 2.5750   LearningRate 0.0062   Epoch: 15   Global Step: 186390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:22:41,542-Speed 3000.55 samples/sec   Loss 2.4762   LearningRate 0.0062   Epoch: 15   Global Step: 186400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:22:44,930-Speed 3022.72 samples/sec   Loss 2.5658   LearningRate 0.0062   Epoch: 15   Global Step: 186410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:22:48,350-Speed 2994.90 samples/sec   Loss 2.5354   LearningRate 0.0062   Epoch: 15   Global Step: 186420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:22:51,745-Speed 3017.92 samples/sec   Loss 2.5427   LearningRate 0.0062   Epoch: 15   Global Step: 186430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:22:55,116-Speed 3037.97 samples/sec   Loss 2.5589   LearningRate 0.0062   Epoch: 15   Global Step: 186440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:22:58,478-Speed 3047.26 samples/sec   Loss 2.4892   LearningRate 0.0062   Epoch: 15   Global Step: 186450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:23:01,865-Speed 3024.47 samples/sec   Loss 2.6255   LearningRate 0.0062   Epoch: 15   Global Step: 186460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:23:05,249-Speed 3026.34 samples/sec   Loss 2.5667   LearningRate 0.0062   Epoch: 15   Global Step: 186470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:23:08,626-Speed 3033.25 samples/sec   Loss 2.5607   LearningRate 0.0062   Epoch: 15   Global Step: 186480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:23:12,067-Speed 2977.15 samples/sec   Loss 2.5827   LearningRate 0.0062   Epoch: 15   Global Step: 186490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:23:15,381-Speed 3090.76 samples/sec   Loss 2.5766   LearningRate 0.0062   Epoch: 15   Global Step: 186500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:23:18,771-Speed 3021.16 samples/sec   Loss 2.5400   LearningRate 0.0062   Epoch: 15   Global Step: 186510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:23:22,124-Speed 3054.78 samples/sec   Loss 2.6483   LearningRate 0.0062   Epoch: 15   Global Step: 186520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:23:25,501-Speed 3032.64 samples/sec   Loss 2.5840   LearningRate 0.0062   Epoch: 15   Global Step: 186530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:23:28,933-Speed 2985.37 samples/sec   Loss 2.5566   LearningRate 0.0062   Epoch: 15   Global Step: 186540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:23:32,306-Speed 3036.39 samples/sec   Loss 2.5131   LearningRate 0.0062   Epoch: 15   Global Step: 186550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:23:35,730-Speed 2991.32 samples/sec   Loss 2.5415   LearningRate 0.0062   Epoch: 15   Global Step: 186560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:23:39,121-Speed 3020.82 samples/sec   Loss 2.6124   LearningRate 0.0062   Epoch: 15   Global Step: 186570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:23:42,507-Speed 3025.31 samples/sec   Loss 2.5562   LearningRate 0.0062   Epoch: 15   Global Step: 186580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:23:45,854-Speed 3060.57 samples/sec   Loss 2.5923   LearningRate 0.0062   Epoch: 15   Global Step: 186590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:23:49,347-Speed 2932.24 samples/sec   Loss 2.5442   LearningRate 0.0062   Epoch: 15   Global Step: 186600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:23:52,740-Speed 3018.98 samples/sec   Loss 2.6755   LearningRate 0.0062   Epoch: 15   Global Step: 186610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:23:56,188-Speed 2970.57 samples/sec   Loss 2.6054   LearningRate 0.0062   Epoch: 15   Global Step: 186620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:23:59,629-Speed 2977.14 samples/sec   Loss 2.6216   LearningRate 0.0062   Epoch: 15   Global Step: 186630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:24:03,103-Speed 2949.14 samples/sec   Loss 2.5277   LearningRate 0.0062   Epoch: 15   Global Step: 186640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:24:06,581-Speed 2944.72 samples/sec   Loss 2.5714   LearningRate 0.0062   Epoch: 15   Global Step: 186650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:24:10,643-Speed 2521.46 samples/sec   Loss 2.6195   LearningRate 0.0062   Epoch: 15   Global Step: 186660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:24:14,435-Speed 2701.76 samples/sec   Loss 2.6524   LearningRate 0.0062   Epoch: 15   Global Step: 186670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:24:18,423-Speed 2568.37 samples/sec   Loss 2.5897   LearningRate 0.0062   Epoch: 15   Global Step: 186680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:24:22,151-Speed 2748.54 samples/sec   Loss 2.5928   LearningRate 0.0062   Epoch: 15   Global Step: 186690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:24:25,482-Speed 3075.43 samples/sec   Loss 2.6192   LearningRate 0.0062   Epoch: 15   Global Step: 186700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:24:29,492-Speed 2554.46 samples/sec   Loss 2.6320   LearningRate 0.0062   Epoch: 15   Global Step: 186710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:24:32,837-Speed 3061.37 samples/sec   Loss 2.6746   LearningRate 0.0062   Epoch: 15   Global Step: 186720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:24:36,185-Speed 3059.85 samples/sec   Loss 2.6434   LearningRate 0.0062   Epoch: 15   Global Step: 186730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:24:39,592-Speed 3006.36 samples/sec   Loss 2.6170   LearningRate 0.0062   Epoch: 15   Global Step: 186740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:24:43,016-Speed 2991.65 samples/sec   Loss 2.5857   LearningRate 0.0062   Epoch: 15   Global Step: 186750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:24:46,439-Speed 2992.33 samples/sec   Loss 2.6328   LearningRate 0.0062   Epoch: 15   Global Step: 186760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:24:49,813-Speed 3036.28 samples/sec   Loss 2.6184   LearningRate 0.0062   Epoch: 15   Global Step: 186770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:24:53,209-Speed 3016.07 samples/sec   Loss 2.6576   LearningRate 0.0062   Epoch: 15   Global Step: 186780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:24:56,606-Speed 3015.17 samples/sec   Loss 2.5563   LearningRate 0.0062   Epoch: 15   Global Step: 186790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:24:59,985-Speed 3031.41 samples/sec   Loss 2.6394   LearningRate 0.0062   Epoch: 15   Global Step: 186800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:25:03,462-Speed 2945.46 samples/sec   Loss 2.6533   LearningRate 0.0062   Epoch: 15   Global Step: 186810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:25:06,827-Speed 3044.63 samples/sec   Loss 2.6508   LearningRate 0.0061   Epoch: 15   Global Step: 186820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:25:10,231-Speed 3008.78 samples/sec   Loss 2.6105   LearningRate 0.0061   Epoch: 15   Global Step: 186830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:25:13,688-Speed 2963.28 samples/sec   Loss 2.6092   LearningRate 0.0061   Epoch: 15   Global Step: 186840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:25:17,056-Speed 3040.96 samples/sec   Loss 2.6081   LearningRate 0.0061   Epoch: 15   Global Step: 186850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:25:20,456-Speed 3012.45 samples/sec   Loss 2.6253   LearningRate 0.0061   Epoch: 15   Global Step: 186860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:25:23,805-Speed 3059.07 samples/sec   Loss 2.5562   LearningRate 0.0061   Epoch: 15   Global Step: 186870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:25:27,210-Speed 3008.15 samples/sec   Loss 2.6361   LearningRate 0.0061   Epoch: 15   Global Step: 186880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:25:30,659-Speed 2969.70 samples/sec   Loss 2.6448   LearningRate 0.0061   Epoch: 15   Global Step: 186890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:25:34,069-Speed 3003.81 samples/sec   Loss 2.6746   LearningRate 0.0061   Epoch: 15   Global Step: 186900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:25:37,445-Speed 3034.23 samples/sec   Loss 2.6480   LearningRate 0.0061   Epoch: 15   Global Step: 186910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:25:40,836-Speed 3020.87 samples/sec   Loss 2.7014   LearningRate 0.0061   Epoch: 15   Global Step: 186920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:25:44,257-Speed 2993.90 samples/sec   Loss 2.7329   LearningRate 0.0061   Epoch: 15   Global Step: 186930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:25:47,731-Speed 2948.92 samples/sec   Loss 2.6036   LearningRate 0.0061   Epoch: 15   Global Step: 186940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:25:51,149-Speed 2996.21 samples/sec   Loss 2.6404   LearningRate 0.0061   Epoch: 15   Global Step: 186950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:25:54,529-Speed 3031.18 samples/sec   Loss 2.6403   LearningRate 0.0061   Epoch: 15   Global Step: 186960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:25:57,994-Speed 2955.91 samples/sec   Loss 2.6116   LearningRate 0.0061   Epoch: 15   Global Step: 186970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:26:01,403-Speed 3004.11 samples/sec   Loss 2.6131   LearningRate 0.0061   Epoch: 15   Global Step: 186980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:26:04,850-Speed 2971.84 samples/sec   Loss 2.6404   LearningRate 0.0061   Epoch: 15   Global Step: 186990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:26:08,204-Speed 3053.84 samples/sec   Loss 2.6895   LearningRate 0.0061   Epoch: 15   Global Step: 187000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:26:11,638-Speed 2983.27 samples/sec   Loss 2.6651   LearningRate 0.0061   Epoch: 15   Global Step: 187010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:26:15,006-Speed 3041.08 samples/sec   Loss 2.6056   LearningRate 0.0061   Epoch: 15   Global Step: 187020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:26:19,001-Speed 2563.64 samples/sec   Loss 2.6387   LearningRate 0.0061   Epoch: 15   Global Step: 187030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:26:22,349-Speed 3059.72 samples/sec   Loss 2.6490   LearningRate 0.0061   Epoch: 15   Global Step: 187040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:26:25,777-Speed 2988.15 samples/sec   Loss 2.7026   LearningRate 0.0061   Epoch: 15   Global Step: 187050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:26:29,783-Speed 2556.95 samples/sec   Loss 2.6205   LearningRate 0.0061   Epoch: 15   Global Step: 187060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:26:33,198-Speed 2998.29 samples/sec   Loss 2.6593   LearningRate 0.0061   Epoch: 15   Global Step: 187070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:26:36,555-Speed 3051.70 samples/sec   Loss 2.7245   LearningRate 0.0061   Epoch: 15   Global Step: 187080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:26:39,938-Speed 3027.95 samples/sec   Loss 2.7041   LearningRate 0.0061   Epoch: 15   Global Step: 187090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:26:43,315-Speed 3034.50 samples/sec   Loss 2.6878   LearningRate 0.0061   Epoch: 15   Global Step: 187100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:26:46,717-Speed 3011.09 samples/sec   Loss 2.6693   LearningRate 0.0061   Epoch: 15   Global Step: 187110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:26:50,216-Speed 2927.18 samples/sec   Loss 2.6720   LearningRate 0.0061   Epoch: 15   Global Step: 187120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:26:53,607-Speed 3020.81 samples/sec   Loss 2.6143   LearningRate 0.0061   Epoch: 15   Global Step: 187130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:26:56,968-Speed 3047.89 samples/sec   Loss 2.6269   LearningRate 0.0061   Epoch: 15   Global Step: 187140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:00,403-Speed 2982.39 samples/sec   Loss 2.6708   LearningRate 0.0061   Epoch: 15   Global Step: 187150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:03,801-Speed 3014.26 samples/sec   Loss 2.6138   LearningRate 0.0061   Epoch: 15   Global Step: 187160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:07,196-Speed 3016.80 samples/sec   Loss 2.7614   LearningRate 0.0061   Epoch: 15   Global Step: 187170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:10,556-Speed 3048.70 samples/sec   Loss 2.7156   LearningRate 0.0061   Epoch: 15   Global Step: 187180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:13,913-Speed 3051.54 samples/sec   Loss 2.6746   LearningRate 0.0061   Epoch: 15   Global Step: 187190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:17,271-Speed 3049.62 samples/sec   Loss 2.5863   LearningRate 0.0061   Epoch: 15   Global Step: 187200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:20,678-Speed 3006.55 samples/sec   Loss 2.6192   LearningRate 0.0061   Epoch: 15   Global Step: 187210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:24,089-Speed 3003.67 samples/sec   Loss 2.7005   LearningRate 0.0061   Epoch: 15   Global Step: 187220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:27,535-Speed 2971.68 samples/sec   Loss 2.6408   LearningRate 0.0061   Epoch: 15   Global Step: 187230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:30,961-Speed 2989.69 samples/sec   Loss 2.6259   LearningRate 0.0061   Epoch: 15   Global Step: 187240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:34,388-Speed 2989.17 samples/sec   Loss 2.7086   LearningRate 0.0061   Epoch: 15   Global Step: 187250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:37,776-Speed 3022.99 samples/sec   Loss 2.7852   LearningRate 0.0061   Epoch: 15   Global Step: 187260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:41,094-Speed 3087.11 samples/sec   Loss 2.6973   LearningRate 0.0061   Epoch: 15   Global Step: 187270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:44,484-Speed 3021.70 samples/sec   Loss 2.7074   LearningRate 0.0061   Epoch: 15   Global Step: 187280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:47,854-Speed 3039.22 samples/sec   Loss 2.7144   LearningRate 0.0061   Epoch: 15   Global Step: 187290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:51,248-Speed 3017.96 samples/sec   Loss 2.6552   LearningRate 0.0061   Epoch: 15   Global Step: 187300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:27:54,637-Speed 3022.52 samples/sec   Loss 2.6280   LearningRate 0.0061   Epoch: 15   Global Step: 187310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:27:57,965-Speed 3077.15 samples/sec   Loss 2.6616   LearningRate 0.0060   Epoch: 15   Global Step: 187320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:28:01,321-Speed 3052.03 samples/sec   Loss 2.6632   LearningRate 0.0060   Epoch: 15   Global Step: 187330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:28:04,719-Speed 3014.29 samples/sec   Loss 2.6667   LearningRate 0.0060   Epoch: 15   Global Step: 187340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:28:08,220-Speed 2926.06 samples/sec   Loss 2.7062   LearningRate 0.0060   Epoch: 15   Global Step: 187350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:28:11,635-Speed 2998.69 samples/sec   Loss 2.7247   LearningRate 0.0060   Epoch: 15   Global Step: 187360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:28:14,993-Speed 3050.77 samples/sec   Loss 2.7056   LearningRate 0.0060   Epoch: 15   Global Step: 187370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:28:18,376-Speed 3028.03 samples/sec   Loss 2.6992   LearningRate 0.0060   Epoch: 15   Global Step: 187380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:28:21,740-Speed 3045.08 samples/sec   Loss 2.7527   LearningRate 0.0060   Epoch: 15   Global Step: 187390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:28:25,090-Speed 3057.65 samples/sec   Loss 2.6462   LearningRate 0.0060   Epoch: 15   Global Step: 187400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:28:28,433-Speed 3064.33 samples/sec   Loss 2.6736   LearningRate 0.0060   Epoch: 15   Global Step: 187410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:28:31,831-Speed 3014.32 samples/sec   Loss 2.7263   LearningRate 0.0060   Epoch: 15   Global Step: 187420   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:28:35,189-Speed 3049.61 samples/sec   Loss 2.7938   LearningRate 0.0060   Epoch: 15   Global Step: 187430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:28:38,558-Speed 3041.05 samples/sec   Loss 2.7292   LearningRate 0.0060   Epoch: 15   Global Step: 187440   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:28:41,949-Speed 3019.81 samples/sec   Loss 2.6928   LearningRate 0.0060   Epoch: 15   Global Step: 187450   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:28:45,304-Speed 3053.49 samples/sec   Loss 2.7281   LearningRate 0.0060   Epoch: 15   Global Step: 187460   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:28:48,757-Speed 2965.98 samples/sec   Loss 2.6885   LearningRate 0.0060   Epoch: 15   Global Step: 187470   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:28:52,116-Speed 3049.64 samples/sec   Loss 2.7348   LearningRate 0.0060   Epoch: 15   Global Step: 187480   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:28:55,490-Speed 3036.31 samples/sec   Loss 2.7311   LearningRate 0.0060   Epoch: 15   Global Step: 187490   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:28:58,833-Speed 3064.46 samples/sec   Loss 2.7314   LearningRate 0.0060   Epoch: 15   Global Step: 187500   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:29:02,187-Speed 3053.28 samples/sec   Loss 2.7072   LearningRate 0.0060   Epoch: 15   Global Step: 187510   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:29:05,555-Speed 3041.29 samples/sec   Loss 2.7037   LearningRate 0.0060   Epoch: 15   Global Step: 187520   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:29:08,941-Speed 3026.44 samples/sec   Loss 2.7928   LearningRate 0.0060   Epoch: 15   Global Step: 187530   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:29:12,376-Speed 2981.05 samples/sec   Loss 2.6898   LearningRate 0.0060   Epoch: 15   Global Step: 187540   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:29:15,789-Speed 3001.18 samples/sec   Loss 2.7467   LearningRate 0.0060   Epoch: 15   Global Step: 187550   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 19:29:19,237-Speed 2971.01 samples/sec   Loss 2.7152   LearningRate 0.0060   Epoch: 15   Global Step: 187560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:29:22,624-Speed 3024.48 samples/sec   Loss 2.7186   LearningRate 0.0060   Epoch: 15   Global Step: 187570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:29:26,034-Speed 3003.49 samples/sec   Loss 2.8079   LearningRate 0.0060   Epoch: 15   Global Step: 187580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:29:29,397-Speed 3046.17 samples/sec   Loss 2.7236   LearningRate 0.0060   Epoch: 15   Global Step: 187590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:29:32,708-Speed 3092.99 samples/sec   Loss 2.7054   LearningRate 0.0060   Epoch: 15   Global Step: 187600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:29:36,114-Speed 3007.57 samples/sec   Loss 2.7483   LearningRate 0.0060   Epoch: 15   Global Step: 187610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:29:39,558-Speed 2973.58 samples/sec   Loss 2.7585   LearningRate 0.0060   Epoch: 15   Global Step: 187620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:29:42,918-Speed 3048.51 samples/sec   Loss 2.7133   LearningRate 0.0060   Epoch: 15   Global Step: 187630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:29:46,367-Speed 2970.24 samples/sec   Loss 2.6792   LearningRate 0.0060   Epoch: 15   Global Step: 187640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:29:49,732-Speed 3043.57 samples/sec   Loss 2.6685   LearningRate 0.0060   Epoch: 15   Global Step: 187650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:29:53,171-Speed 2978.84 samples/sec   Loss 2.7264   LearningRate 0.0060   Epoch: 15   Global Step: 187660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:29:56,595-Speed 2991.77 samples/sec   Loss 2.6828   LearningRate 0.0060   Epoch: 15   Global Step: 187670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:29:59,959-Speed 3045.01 samples/sec   Loss 2.7938   LearningRate 0.0060   Epoch: 15   Global Step: 187680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:30:03,310-Speed 3057.00 samples/sec   Loss 2.7071   LearningRate 0.0060   Epoch: 15   Global Step: 187690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:30:06,640-Speed 3075.63 samples/sec   Loss 2.7805   LearningRate 0.0060   Epoch: 15   Global Step: 187700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:30:10,039-Speed 3013.89 samples/sec   Loss 2.7935   LearningRate 0.0060   Epoch: 15   Global Step: 187710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:30:13,397-Speed 3050.48 samples/sec   Loss 2.7229   LearningRate 0.0060   Epoch: 15   Global Step: 187720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:30:16,756-Speed 3049.01 samples/sec   Loss 2.7339   LearningRate 0.0060   Epoch: 15   Global Step: 187730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:30:20,127-Speed 3038.92 samples/sec   Loss 2.7740   LearningRate 0.0060   Epoch: 15   Global Step: 187740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:30:23,537-Speed 3003.77 samples/sec   Loss 2.7178   LearningRate 0.0060   Epoch: 15   Global Step: 187750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:30:26,972-Speed 2981.97 samples/sec   Loss 2.7510   LearningRate 0.0060   Epoch: 15   Global Step: 187760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:30:30,380-Speed 3006.90 samples/sec   Loss 2.6870   LearningRate 0.0060   Epoch: 15   Global Step: 187770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:30:33,837-Speed 2963.14 samples/sec   Loss 2.7640   LearningRate 0.0060   Epoch: 15   Global Step: 187780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:30:37,178-Speed 3066.34 samples/sec   Loss 2.7220   LearningRate 0.0060   Epoch: 15   Global Step: 187790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:30:40,591-Speed 3001.08 samples/sec   Loss 2.8013   LearningRate 0.0060   Epoch: 15   Global Step: 187800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:30:44,022-Speed 2984.98 samples/sec   Loss 2.7935   LearningRate 0.0060   Epoch: 15   Global Step: 187810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:30:47,335-Speed 3091.89 samples/sec   Loss 2.7501   LearningRate 0.0060   Epoch: 15   Global Step: 187820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:30:50,775-Speed 2978.10 samples/sec   Loss 2.7241   LearningRate 0.0059   Epoch: 15   Global Step: 187830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:30:54,146-Speed 3038.57 samples/sec   Loss 2.8471   LearningRate 0.0059   Epoch: 15   Global Step: 187840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:30:57,506-Speed 3048.06 samples/sec   Loss 2.7652   LearningRate 0.0059   Epoch: 15   Global Step: 187850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:31:00,901-Speed 3016.89 samples/sec   Loss 2.7216   LearningRate 0.0059   Epoch: 15   Global Step: 187860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:31:04,245-Speed 3063.50 samples/sec   Loss 2.7138   LearningRate 0.0059   Epoch: 15   Global Step: 187870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:31:07,563-Speed 3086.93 samples/sec   Loss 2.8272   LearningRate 0.0059   Epoch: 15   Global Step: 187880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:31:10,916-Speed 3055.06 samples/sec   Loss 2.7993   LearningRate 0.0059   Epoch: 15   Global Step: 187890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:31:14,280-Speed 3044.56 samples/sec   Loss 2.7942   LearningRate 0.0059   Epoch: 15   Global Step: 187900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:31:17,657-Speed 3033.42 samples/sec   Loss 2.8220   LearningRate 0.0059   Epoch: 15   Global Step: 187910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:31:21,084-Speed 2988.72 samples/sec   Loss 2.7778   LearningRate 0.0059   Epoch: 15   Global Step: 187920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:31:24,579-Speed 2931.38 samples/sec   Loss 2.8601   LearningRate 0.0059   Epoch: 15   Global Step: 187930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:31:27,967-Speed 3022.94 samples/sec   Loss 2.7977   LearningRate 0.0059   Epoch: 15   Global Step: 187940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:31:31,388-Speed 2994.01 samples/sec   Loss 2.7777   LearningRate 0.0059   Epoch: 15   Global Step: 187950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:31:34,729-Speed 3066.56 samples/sec   Loss 2.7568   LearningRate 0.0059   Epoch: 15   Global Step: 187960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:31:38,116-Speed 3024.38 samples/sec   Loss 2.7863   LearningRate 0.0059   Epoch: 15   Global Step: 187970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:31:41,568-Speed 2967.21 samples/sec   Loss 2.8627   LearningRate 0.0059   Epoch: 15   Global Step: 187980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:31:44,898-Speed 3075.50 samples/sec   Loss 2.7777   LearningRate 0.0059   Epoch: 15   Global Step: 187990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:31:48,234-Speed 3070.35 samples/sec   Loss 2.6840   LearningRate 0.0059   Epoch: 15   Global Step: 188000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:31:51,579-Speed 3062.19 samples/sec   Loss 2.7223   LearningRate 0.0059   Epoch: 15   Global Step: 188010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:31:55,019-Speed 2977.84 samples/sec   Loss 2.7462   LearningRate 0.0059   Epoch: 15   Global Step: 188020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:31:58,480-Speed 2959.32 samples/sec   Loss 2.7592   LearningRate 0.0059   Epoch: 15   Global Step: 188030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:32:01,776-Speed 3107.98 samples/sec   Loss 2.6994   LearningRate 0.0059   Epoch: 15   Global Step: 188040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:32:05,152-Speed 3034.11 samples/sec   Loss 2.8597   LearningRate 0.0059   Epoch: 15   Global Step: 188050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:32:08,564-Speed 3001.84 samples/sec   Loss 2.8398   LearningRate 0.0059   Epoch: 15   Global Step: 188060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:32:11,904-Speed 3066.58 samples/sec   Loss 2.7211   LearningRate 0.0059   Epoch: 15   Global Step: 188070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:32:15,310-Speed 3007.56 samples/sec   Loss 2.7584   LearningRate 0.0059   Epoch: 15   Global Step: 188080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:32:18,710-Speed 3012.33 samples/sec   Loss 2.7833   LearningRate 0.0059   Epoch: 15   Global Step: 188090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:32:22,043-Speed 3074.11 samples/sec   Loss 2.7978   LearningRate 0.0059   Epoch: 15   Global Step: 188100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:32:25,416-Speed 3036.26 samples/sec   Loss 2.8018   LearningRate 0.0059   Epoch: 15   Global Step: 188110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:32:28,826-Speed 3004.53 samples/sec   Loss 2.7610   LearningRate 0.0059   Epoch: 15   Global Step: 188120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:32:32,258-Speed 2984.21 samples/sec   Loss 2.8701   LearningRate 0.0059   Epoch: 15   Global Step: 188130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:32:35,681-Speed 2991.81 samples/sec   Loss 2.7274   LearningRate 0.0059   Epoch: 15   Global Step: 188140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:32:39,054-Speed 3036.86 samples/sec   Loss 2.8047   LearningRate 0.0059   Epoch: 15   Global Step: 188150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:32:42,455-Speed 3012.40 samples/sec   Loss 2.7376   LearningRate 0.0059   Epoch: 15   Global Step: 188160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:32:45,913-Speed 2961.68 samples/sec   Loss 2.8719   LearningRate 0.0059   Epoch: 15   Global Step: 188170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:32:49,329-Speed 2998.30 samples/sec   Loss 2.8023   LearningRate 0.0059   Epoch: 15   Global Step: 188180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:32:52,741-Speed 3002.03 samples/sec   Loss 2.8410   LearningRate 0.0059   Epoch: 15   Global Step: 188190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:32:56,151-Speed 3003.61 samples/sec   Loss 2.8474   LearningRate 0.0059   Epoch: 15   Global Step: 188200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:32:59,502-Speed 3057.29 samples/sec   Loss 2.7496   LearningRate 0.0059   Epoch: 15   Global Step: 188210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:33:02,885-Speed 3027.11 samples/sec   Loss 2.7745   LearningRate 0.0059   Epoch: 15   Global Step: 188220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:33:06,370-Speed 2939.24 samples/sec   Loss 2.7959   LearningRate 0.0059   Epoch: 15   Global Step: 188230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:33:09,806-Speed 2981.66 samples/sec   Loss 2.7995   LearningRate 0.0059   Epoch: 15   Global Step: 188240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:33:13,244-Speed 2979.43 samples/sec   Loss 2.8307   LearningRate 0.0059   Epoch: 15   Global Step: 188250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:33:16,632-Speed 3022.84 samples/sec   Loss 2.7693   LearningRate 0.0059   Epoch: 15   Global Step: 188260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:33:19,993-Speed 3047.93 samples/sec   Loss 2.8084   LearningRate 0.0059   Epoch: 15   Global Step: 188270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:33:23,334-Speed 3066.23 samples/sec   Loss 2.8010   LearningRate 0.0059   Epoch: 15   Global Step: 188280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:33:26,716-Speed 3028.80 samples/sec   Loss 2.8743   LearningRate 0.0059   Epoch: 15   Global Step: 188290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:33:30,078-Speed 3046.04 samples/sec   Loss 2.8495   LearningRate 0.0059   Epoch: 15   Global Step: 188300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:33:33,422-Speed 3062.97 samples/sec   Loss 2.8807   LearningRate 0.0059   Epoch: 15   Global Step: 188310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:33:36,801-Speed 3031.42 samples/sec   Loss 2.8195   LearningRate 0.0059   Epoch: 15   Global Step: 188320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:33:40,209-Speed 3006.04 samples/sec   Loss 2.8565   LearningRate 0.0059   Epoch: 15   Global Step: 188330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:33:43,616-Speed 3006.73 samples/sec   Loss 2.8439   LearningRate 0.0058   Epoch: 15   Global Step: 188340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:33:47,036-Speed 2994.33 samples/sec   Loss 2.7573   LearningRate 0.0058   Epoch: 15   Global Step: 188350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:33:50,453-Speed 2998.19 samples/sec   Loss 2.8887   LearningRate 0.0058   Epoch: 15   Global Step: 188360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:33:53,826-Speed 3036.66 samples/sec   Loss 2.8700   LearningRate 0.0058   Epoch: 15   Global Step: 188370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:33:57,241-Speed 2999.85 samples/sec   Loss 2.7705   LearningRate 0.0058   Epoch: 15   Global Step: 188380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:34:00,647-Speed 3007.53 samples/sec   Loss 2.8108   LearningRate 0.0058   Epoch: 15   Global Step: 188390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:03,984-Speed 3069.18 samples/sec   Loss 2.8094   LearningRate 0.0058   Epoch: 15   Global Step: 188400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:07,362-Speed 3031.33 samples/sec   Loss 2.9225   LearningRate 0.0058   Epoch: 15   Global Step: 188410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:10,714-Speed 3055.99 samples/sec   Loss 2.7835   LearningRate 0.0058   Epoch: 15   Global Step: 188420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:14,136-Speed 2993.10 samples/sec   Loss 2.9056   LearningRate 0.0058   Epoch: 15   Global Step: 188430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:17,560-Speed 2991.29 samples/sec   Loss 2.8169   LearningRate 0.0058   Epoch: 15   Global Step: 188440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:21,068-Speed 2920.70 samples/sec   Loss 2.8589   LearningRate 0.0058   Epoch: 15   Global Step: 188450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:24,474-Speed 3006.59 samples/sec   Loss 2.7661   LearningRate 0.0058   Epoch: 15   Global Step: 188460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:27,880-Speed 3007.42 samples/sec   Loss 2.8469   LearningRate 0.0058   Epoch: 15   Global Step: 188470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:31,310-Speed 2987.06 samples/sec   Loss 2.8291   LearningRate 0.0058   Epoch: 15   Global Step: 188480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:34,740-Speed 2985.84 samples/sec   Loss 2.8203   LearningRate 0.0058   Epoch: 15   Global Step: 188490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:34:38,126-Speed 3024.86 samples/sec   Loss 2.8309   LearningRate 0.0058   Epoch: 15   Global Step: 188500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:41,474-Speed 3059.32 samples/sec   Loss 2.7895   LearningRate 0.0058   Epoch: 15   Global Step: 188510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:44,881-Speed 3006.55 samples/sec   Loss 2.8098   LearningRate 0.0058   Epoch: 15   Global Step: 188520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:48,272-Speed 3020.45 samples/sec   Loss 2.9114   LearningRate 0.0058   Epoch: 15   Global Step: 188530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:51,712-Speed 2977.65 samples/sec   Loss 2.8851   LearningRate 0.0058   Epoch: 15   Global Step: 188540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:55,110-Speed 3014.09 samples/sec   Loss 2.8535   LearningRate 0.0058   Epoch: 15   Global Step: 188550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:34:58,452-Speed 3065.63 samples/sec   Loss 2.8205   LearningRate 0.0058   Epoch: 15   Global Step: 188560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:35:01,848-Speed 3016.14 samples/sec   Loss 2.7928   LearningRate 0.0058   Epoch: 15   Global Step: 188570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:35:05,257-Speed 3003.91 samples/sec   Loss 2.8725   LearningRate 0.0058   Epoch: 15   Global Step: 188580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:35:08,636-Speed 3031.52 samples/sec   Loss 2.8725   LearningRate 0.0058   Epoch: 15   Global Step: 188590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:35:12,092-Speed 2964.58 samples/sec   Loss 2.8257   LearningRate 0.0058   Epoch: 15   Global Step: 188600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:35:15,517-Speed 2990.06 samples/sec   Loss 2.8404   LearningRate 0.0058   Epoch: 15   Global Step: 188610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:35:18,934-Speed 2997.74 samples/sec   Loss 2.8032   LearningRate 0.0058   Epoch: 15   Global Step: 188620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:35:22,342-Speed 3006.11 samples/sec   Loss 2.8620   LearningRate 0.0058   Epoch: 15   Global Step: 188630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:35:25,828-Speed 2937.73 samples/sec   Loss 2.8324   LearningRate 0.0058   Epoch: 15   Global Step: 188640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:35:29,281-Speed 2966.81 samples/sec   Loss 2.8108   LearningRate 0.0058   Epoch: 15   Global Step: 188650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:35:32,692-Speed 3002.78 samples/sec   Loss 2.8407   LearningRate 0.0058   Epoch: 15   Global Step: 188660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:35:36,097-Speed 3008.17 samples/sec   Loss 2.7965   LearningRate 0.0058   Epoch: 15   Global Step: 188670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:35:39,574-Speed 2946.17 samples/sec   Loss 2.8462   LearningRate 0.0058   Epoch: 15   Global Step: 188680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:35:42,982-Speed 3005.71 samples/sec   Loss 2.7930   LearningRate 0.0058   Epoch: 15   Global Step: 188690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:35:46,366-Speed 3027.03 samples/sec   Loss 2.8059   LearningRate 0.0058   Epoch: 15   Global Step: 188700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:35:49,810-Speed 2973.60 samples/sec   Loss 2.8608   LearningRate 0.0058   Epoch: 15   Global Step: 188710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:35:53,263-Speed 2966.70 samples/sec   Loss 2.9484   LearningRate 0.0058   Epoch: 15   Global Step: 188720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:35:56,699-Speed 2980.98 samples/sec   Loss 2.8248   LearningRate 0.0058   Epoch: 15   Global Step: 188730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:36:00,128-Speed 2987.61 samples/sec   Loss 2.8993   LearningRate 0.0058   Epoch: 15   Global Step: 188740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:36:03,606-Speed 2944.82 samples/sec   Loss 2.9431   LearningRate 0.0058   Epoch: 15   Global Step: 188750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:36:06,999-Speed 3018.29 samples/sec   Loss 2.8852   LearningRate 0.0058   Epoch: 15   Global Step: 188760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:36:10,396-Speed 3015.50 samples/sec   Loss 2.7891   LearningRate 0.0058   Epoch: 15   Global Step: 188770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:36:13,806-Speed 3004.54 samples/sec   Loss 2.7934   LearningRate 0.0058   Epoch: 15   Global Step: 188780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:36:17,219-Speed 3001.08 samples/sec   Loss 2.9069   LearningRate 0.0058   Epoch: 15   Global Step: 188790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:36:20,618-Speed 3014.16 samples/sec   Loss 2.9364   LearningRate 0.0058   Epoch: 15   Global Step: 188800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:36:23,962-Speed 3064.49 samples/sec   Loss 2.8079   LearningRate 0.0058   Epoch: 15   Global Step: 188810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:36:27,358-Speed 3015.86 samples/sec   Loss 2.9031   LearningRate 0.0058   Epoch: 15   Global Step: 188820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:36:30,772-Speed 3003.30 samples/sec   Loss 2.8029   LearningRate 0.0058   Epoch: 15   Global Step: 188830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:36:34,112-Speed 3066.44 samples/sec   Loss 2.9220   LearningRate 0.0058   Epoch: 15   Global Step: 188840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:36:37,529-Speed 2996.89 samples/sec   Loss 2.8663   LearningRate 0.0058   Epoch: 15   Global Step: 188850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:36:40,905-Speed 3034.12 samples/sec   Loss 2.9010   LearningRate 0.0057   Epoch: 15   Global Step: 188860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:36:44,284-Speed 3032.03 samples/sec   Loss 2.9063   LearningRate 0.0057   Epoch: 15   Global Step: 188870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:36:47,646-Speed 3046.62 samples/sec   Loss 2.8612   LearningRate 0.0057   Epoch: 15   Global Step: 188880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:36:51,059-Speed 3001.13 samples/sec   Loss 2.8761   LearningRate 0.0057   Epoch: 15   Global Step: 188890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:36:54,578-Speed 2910.63 samples/sec   Loss 2.8743   LearningRate 0.0057   Epoch: 15   Global Step: 188900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:36:57,915-Speed 3069.34 samples/sec   Loss 2.9206   LearningRate 0.0057   Epoch: 15   Global Step: 188910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:37:01,276-Speed 3048.03 samples/sec   Loss 2.9271   LearningRate 0.0057   Epoch: 15   Global Step: 188920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:37:04,672-Speed 3016.42 samples/sec   Loss 2.9515   LearningRate 0.0057   Epoch: 15   Global Step: 188930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:37:08,043-Speed 3038.52 samples/sec   Loss 2.9252   LearningRate 0.0057   Epoch: 15   Global Step: 188940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:37:11,498-Speed 2965.68 samples/sec   Loss 2.8529   LearningRate 0.0057   Epoch: 15   Global Step: 188950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:37:14,876-Speed 3031.69 samples/sec   Loss 2.8974   LearningRate 0.0057   Epoch: 15   Global Step: 188960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:37:18,256-Speed 3030.36 samples/sec   Loss 2.8806   LearningRate 0.0057   Epoch: 15   Global Step: 188970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:37:21,710-Speed 2966.03 samples/sec   Loss 2.8755   LearningRate 0.0057   Epoch: 15   Global Step: 188980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:37:25,194-Speed 2939.55 samples/sec   Loss 2.9018   LearningRate 0.0057   Epoch: 15   Global Step: 188990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:37:28,618-Speed 2992.09 samples/sec   Loss 2.8960   LearningRate 0.0057   Epoch: 15   Global Step: 189000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:37:32,008-Speed 3022.02 samples/sec   Loss 2.8511   LearningRate 0.0057   Epoch: 15   Global Step: 189010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:37:35,460-Speed 2967.18 samples/sec   Loss 2.9116   LearningRate 0.0057   Epoch: 15   Global Step: 189020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:37:38,920-Speed 2961.38 samples/sec   Loss 2.8689   LearningRate 0.0057   Epoch: 15   Global Step: 189030   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:37:42,297-Speed 3033.02 samples/sec   Loss 2.8610   LearningRate 0.0057   Epoch: 15   Global Step: 189040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:37:45,744-Speed 2972.10 samples/sec   Loss 2.8480   LearningRate 0.0057   Epoch: 15   Global Step: 189050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:37:49,165-Speed 2993.90 samples/sec   Loss 2.8895   LearningRate 0.0057   Epoch: 15   Global Step: 189060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:37:52,586-Speed 2994.48 samples/sec   Loss 2.8941   LearningRate 0.0057   Epoch: 15   Global Step: 189070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:37:55,991-Speed 3007.82 samples/sec   Loss 2.8930   LearningRate 0.0057   Epoch: 15   Global Step: 189080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:37:59,379-Speed 3023.01 samples/sec   Loss 2.9874   LearningRate 0.0057   Epoch: 15   Global Step: 189090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:02,818-Speed 2978.98 samples/sec   Loss 2.9016   LearningRate 0.0057   Epoch: 15   Global Step: 189100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:06,135-Speed 3087.80 samples/sec   Loss 2.8122   LearningRate 0.0057   Epoch: 15   Global Step: 189110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:09,586-Speed 2968.05 samples/sec   Loss 2.9039   LearningRate 0.0057   Epoch: 15   Global Step: 189120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:12,984-Speed 3014.09 samples/sec   Loss 2.8560   LearningRate 0.0057   Epoch: 15   Global Step: 189130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:16,396-Speed 3002.52 samples/sec   Loss 2.9040   LearningRate 0.0057   Epoch: 15   Global Step: 189140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:19,714-Speed 3086.55 samples/sec   Loss 2.8507   LearningRate 0.0057   Epoch: 15   Global Step: 189150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:23,126-Speed 3001.90 samples/sec   Loss 2.8619   LearningRate 0.0057   Epoch: 15   Global Step: 189160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:26,510-Speed 3027.19 samples/sec   Loss 2.9056   LearningRate 0.0057   Epoch: 15   Global Step: 189170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:29,875-Speed 3044.18 samples/sec   Loss 2.8515   LearningRate 0.0057   Epoch: 15   Global Step: 189180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:38:33,254-Speed 3030.78 samples/sec   Loss 2.8657   LearningRate 0.0057   Epoch: 15   Global Step: 189190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:36,703-Speed 2969.61 samples/sec   Loss 2.9267   LearningRate 0.0057   Epoch: 15   Global Step: 189200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:40,070-Speed 3042.47 samples/sec   Loss 2.8798   LearningRate 0.0057   Epoch: 15   Global Step: 189210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:43,487-Speed 2997.70 samples/sec   Loss 2.8434   LearningRate 0.0057   Epoch: 15   Global Step: 189220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:46,838-Speed 3057.12 samples/sec   Loss 2.9664   LearningRate 0.0057   Epoch: 15   Global Step: 189230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:50,217-Speed 3031.11 samples/sec   Loss 2.8148   LearningRate 0.0057   Epoch: 15   Global Step: 189240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:53,584-Speed 3041.71 samples/sec   Loss 2.8752   LearningRate 0.0057   Epoch: 15   Global Step: 189250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:38:56,930-Speed 3061.18 samples/sec   Loss 2.8806   LearningRate 0.0057   Epoch: 15   Global Step: 189260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:39:00,261-Speed 3074.89 samples/sec   Loss 2.8802   LearningRate 0.0057   Epoch: 15   Global Step: 189270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:39:03,623-Speed 3046.84 samples/sec   Loss 2.8552   LearningRate 0.0057   Epoch: 15   Global Step: 189280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:39:07,018-Speed 3016.85 samples/sec   Loss 2.8600   LearningRate 0.0057   Epoch: 15   Global Step: 189290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:39:10,418-Speed 3012.62 samples/sec   Loss 2.8806   LearningRate 0.0057   Epoch: 15   Global Step: 189300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:39:13,890-Speed 2950.59 samples/sec   Loss 2.9366   LearningRate 0.0057   Epoch: 15   Global Step: 189310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:39:17,326-Speed 2981.14 samples/sec   Loss 2.8506   LearningRate 0.0057   Epoch: 15   Global Step: 189320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:39:20,704-Speed 3032.35 samples/sec   Loss 2.9660   LearningRate 0.0057   Epoch: 15   Global Step: 189330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:39:24,110-Speed 3007.36 samples/sec   Loss 2.9193   LearningRate 0.0057   Epoch: 15   Global Step: 189340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:39:27,552-Speed 2975.77 samples/sec   Loss 2.8884   LearningRate 0.0057   Epoch: 15   Global Step: 189350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:39:30,935-Speed 3027.49 samples/sec   Loss 2.9279   LearningRate 0.0057   Epoch: 15   Global Step: 189360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:39:34,351-Speed 2998.91 samples/sec   Loss 2.9006   LearningRate 0.0057   Epoch: 15   Global Step: 189370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:39:37,795-Speed 2973.41 samples/sec   Loss 2.8923   LearningRate 0.0056   Epoch: 15   Global Step: 189380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:39:41,204-Speed 3004.76 samples/sec   Loss 2.8865   LearningRate 0.0056   Epoch: 15   Global Step: 189390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:39:44,615-Speed 3003.12 samples/sec   Loss 2.8235   LearningRate 0.0056   Epoch: 15   Global Step: 189400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:39:47,995-Speed 3030.18 samples/sec   Loss 2.8843   LearningRate 0.0056   Epoch: 15   Global Step: 189410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:39:51,480-Speed 2939.36 samples/sec   Loss 2.9374   LearningRate 0.0056   Epoch: 15   Global Step: 189420   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:39:54,899-Speed 2995.63 samples/sec   Loss 2.8477   LearningRate 0.0056   Epoch: 15   Global Step: 189430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:39:58,267-Speed 3041.32 samples/sec   Loss 2.9583   LearningRate 0.0056   Epoch: 15   Global Step: 189440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:40:01,639-Speed 3037.24 samples/sec   Loss 2.9773   LearningRate 0.0056   Epoch: 15   Global Step: 189450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:40:05,066-Speed 2989.21 samples/sec   Loss 2.9149   LearningRate 0.0056   Epoch: 15   Global Step: 189460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:40:08,378-Speed 3092.82 samples/sec   Loss 2.8128   LearningRate 0.0056   Epoch: 15   Global Step: 189470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:40:11,909-Speed 2901.03 samples/sec   Loss 2.9907   LearningRate 0.0056   Epoch: 15   Global Step: 189480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:40:15,312-Speed 3009.83 samples/sec   Loss 2.9114   LearningRate 0.0056   Epoch: 15   Global Step: 189490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:40:18,724-Speed 3002.06 samples/sec   Loss 2.8892   LearningRate 0.0056   Epoch: 15   Global Step: 189500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:40:22,158-Speed 2983.02 samples/sec   Loss 2.8813   LearningRate 0.0056   Epoch: 15   Global Step: 189510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:40:25,517-Speed 3048.74 samples/sec   Loss 2.9500   LearningRate 0.0056   Epoch: 15   Global Step: 189520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:40:28,921-Speed 3009.61 samples/sec   Loss 2.9068   LearningRate 0.0056   Epoch: 15   Global Step: 189530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:40:32,315-Speed 3017.30 samples/sec   Loss 2.9468   LearningRate 0.0056   Epoch: 15   Global Step: 189540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:40:35,672-Speed 3051.67 samples/sec   Loss 2.8769   LearningRate 0.0056   Epoch: 15   Global Step: 189550   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:40:39,089-Speed 2998.06 samples/sec   Loss 2.9071   LearningRate 0.0056   Epoch: 15   Global Step: 189560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:40:42,444-Speed 3052.71 samples/sec   Loss 2.9696   LearningRate 0.0056   Epoch: 15   Global Step: 189570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:40:45,864-Speed 2994.77 samples/sec   Loss 2.8951   LearningRate 0.0056   Epoch: 15   Global Step: 189580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:40:49,269-Speed 3008.16 samples/sec   Loss 2.9602   LearningRate 0.0056   Epoch: 15   Global Step: 189590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:40:52,646-Speed 3033.24 samples/sec   Loss 2.9731   LearningRate 0.0056   Epoch: 15   Global Step: 189600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:40:56,117-Speed 2950.55 samples/sec   Loss 2.8891   LearningRate 0.0056   Epoch: 15   Global Step: 189610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:40:59,532-Speed 2999.36 samples/sec   Loss 2.9503   LearningRate 0.0056   Epoch: 15   Global Step: 189620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:41:02,919-Speed 3024.88 samples/sec   Loss 2.9380   LearningRate 0.0056   Epoch: 15   Global Step: 189630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:41:06,283-Speed 3043.88 samples/sec   Loss 2.8293   LearningRate 0.0056   Epoch: 15   Global Step: 189640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:41:09,585-Speed 3102.45 samples/sec   Loss 2.9765   LearningRate 0.0056   Epoch: 15   Global Step: 189650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:41:12,935-Speed 3057.96 samples/sec   Loss 2.9439   LearningRate 0.0056   Epoch: 15   Global Step: 189660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:41:16,306-Speed 3039.70 samples/sec   Loss 2.9463   LearningRate 0.0056   Epoch: 15   Global Step: 189670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:41:19,654-Speed 3060.08 samples/sec   Loss 2.9845   LearningRate 0.0056   Epoch: 15   Global Step: 189680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:41:23,084-Speed 2985.58 samples/sec   Loss 2.9140   LearningRate 0.0056   Epoch: 15   Global Step: 189690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:41:26,415-Speed 3074.97 samples/sec   Loss 2.9288   LearningRate 0.0056   Epoch: 15   Global Step: 189700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:41:29,778-Speed 3046.21 samples/sec   Loss 2.9300   LearningRate 0.0056   Epoch: 15   Global Step: 189710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:41:33,167-Speed 3022.89 samples/sec   Loss 2.9603   LearningRate 0.0056   Epoch: 15   Global Step: 189720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:41:36,612-Speed 2973.06 samples/sec   Loss 2.8758   LearningRate 0.0056   Epoch: 15   Global Step: 189730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:41:40,040-Speed 2988.15 samples/sec   Loss 2.9697   LearningRate 0.0056   Epoch: 15   Global Step: 189740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:41:43,476-Speed 2981.02 samples/sec   Loss 2.9217   LearningRate 0.0056   Epoch: 15   Global Step: 189750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:41:46,953-Speed 2946.05 samples/sec   Loss 2.8924   LearningRate 0.0056   Epoch: 15   Global Step: 189760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:41:50,356-Speed 3009.10 samples/sec   Loss 3.0379   LearningRate 0.0056   Epoch: 15   Global Step: 189770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:41:53,720-Speed 3045.08 samples/sec   Loss 2.9839   LearningRate 0.0056   Epoch: 15   Global Step: 189780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:41:57,009-Speed 3114.51 samples/sec   Loss 2.8862   LearningRate 0.0056   Epoch: 15   Global Step: 189790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:42:00,465-Speed 2963.32 samples/sec   Loss 2.9339   LearningRate 0.0056   Epoch: 15   Global Step: 189800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:42:03,801-Speed 3070.40 samples/sec   Loss 2.9817   LearningRate 0.0056   Epoch: 15   Global Step: 189810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:42:07,216-Speed 2999.42 samples/sec   Loss 2.9870   LearningRate 0.0056   Epoch: 15   Global Step: 189820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:42:10,580-Speed 3045.28 samples/sec   Loss 2.8137   LearningRate 0.0056   Epoch: 15   Global Step: 189830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:42:13,934-Speed 3053.56 samples/sec   Loss 2.9507   LearningRate 0.0056   Epoch: 15   Global Step: 189840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:42:17,245-Speed 3093.38 samples/sec   Loss 2.9523   LearningRate 0.0056   Epoch: 15   Global Step: 189850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:42:20,596-Speed 3056.79 samples/sec   Loss 3.0023   LearningRate 0.0056   Epoch: 15   Global Step: 189860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:42:23,962-Speed 3043.63 samples/sec   Loss 2.9006   LearningRate 0.0056   Epoch: 15   Global Step: 189870   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:42:27,354-Speed 3019.35 samples/sec   Loss 3.0141   LearningRate 0.0056   Epoch: 15   Global Step: 189880   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:42:30,804-Speed 2969.32 samples/sec   Loss 2.8805   LearningRate 0.0056   Epoch: 15   Global Step: 189890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:42:34,197-Speed 3018.71 samples/sec   Loss 2.9282   LearningRate 0.0055   Epoch: 15   Global Step: 189900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:42:37,515-Speed 3087.41 samples/sec   Loss 2.9768   LearningRate 0.0055   Epoch: 15   Global Step: 189910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:42:40,828-Speed 3091.05 samples/sec   Loss 3.0011   LearningRate 0.0055   Epoch: 15   Global Step: 189920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:42:44,170-Speed 3065.17 samples/sec   Loss 3.0029   LearningRate 0.0055   Epoch: 15   Global Step: 189930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:42:47,569-Speed 3013.73 samples/sec   Loss 2.9646   LearningRate 0.0055   Epoch: 15   Global Step: 189940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:42:50,975-Speed 3007.23 samples/sec   Loss 2.9903   LearningRate 0.0055   Epoch: 15   Global Step: 189950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:42:54,360-Speed 3026.11 samples/sec   Loss 3.0375   LearningRate 0.0055   Epoch: 15   Global Step: 189960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:42:57,733-Speed 3036.75 samples/sec   Loss 3.0134   LearningRate 0.0055   Epoch: 15   Global Step: 189970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:43:01,134-Speed 3011.61 samples/sec   Loss 3.0183   LearningRate 0.0055   Epoch: 15   Global Step: 189980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:43:04,516-Speed 3028.46 samples/sec   Loss 2.8924   LearningRate 0.0055   Epoch: 15   Global Step: 189990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:43:07,883-Speed 3042.87 samples/sec   Loss 2.9600   LearningRate 0.0055   Epoch: 15   Global Step: 190000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:43:11,296-Speed 3000.64 samples/sec   Loss 2.9184   LearningRate 0.0055   Epoch: 15   Global Step: 190010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:43:14,696-Speed 3012.88 samples/sec   Loss 2.9585   LearningRate 0.0055   Epoch: 15   Global Step: 190020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:43:18,082-Speed 3025.25 samples/sec   Loss 2.9576   LearningRate 0.0055   Epoch: 15   Global Step: 190030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:43:21,457-Speed 3035.20 samples/sec   Loss 2.8943   LearningRate 0.0055   Epoch: 15   Global Step: 190040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:43:24,818-Speed 3046.64 samples/sec   Loss 2.9398   LearningRate 0.0055   Epoch: 15   Global Step: 190050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:43:28,225-Speed 3007.16 samples/sec   Loss 2.8755   LearningRate 0.0055   Epoch: 15   Global Step: 190060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:43:31,635-Speed 3003.26 samples/sec   Loss 3.0154   LearningRate 0.0055   Epoch: 15   Global Step: 190070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:43:35,064-Speed 2986.90 samples/sec   Loss 2.9527   LearningRate 0.0055   Epoch: 15   Global Step: 190080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:43:38,521-Speed 2963.08 samples/sec   Loss 2.9888   LearningRate 0.0055   Epoch: 15   Global Step: 190090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 19:43:41,892-Speed 3038.89 samples/sec   Loss 2.9247   LearningRate 0.0055   Epoch: 15   Global Step: 190100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:43:45,345-Speed 2966.12 samples/sec   Loss 2.9040   LearningRate 0.0055   Epoch: 15   Global Step: 190110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:43:48,748-Speed 3009.87 samples/sec   Loss 2.9097   LearningRate 0.0055   Epoch: 15   Global Step: 190120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:43:52,213-Speed 2956.45 samples/sec   Loss 2.9937   LearningRate 0.0055   Epoch: 15   Global Step: 190130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:43:55,592-Speed 3032.20 samples/sec   Loss 2.9687   LearningRate 0.0055   Epoch: 15   Global Step: 190140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:43:58,927-Speed 3071.26 samples/sec   Loss 2.9828   LearningRate 0.0055   Epoch: 15   Global Step: 190150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:44:02,339-Speed 3001.87 samples/sec   Loss 2.9356   LearningRate 0.0055   Epoch: 15   Global Step: 190160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:44:05,710-Speed 3038.52 samples/sec   Loss 3.0362   LearningRate 0.0055   Epoch: 15   Global Step: 190170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:44:09,078-Speed 3041.13 samples/sec   Loss 2.9346   LearningRate 0.0055   Epoch: 15   Global Step: 190180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:44:12,431-Speed 3055.04 samples/sec   Loss 2.9459   LearningRate 0.0055   Epoch: 15   Global Step: 190190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 19:44:15,841-Speed 3004.28 samples/sec   Loss 3.0600   LearningRate 0.0055   Epoch: 15   Global Step: 190200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 19:44:19,179-Speed 3068.37 samples/sec   Loss 3.0355   LearningRate 0.0055   Epoch: 15   Global Step: 190210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:44:22,508-Speed 3077.05 samples/sec   Loss 2.9548   LearningRate 0.0055   Epoch: 15   Global Step: 190220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:44:25,961-Speed 2966.09 samples/sec   Loss 2.9755   LearningRate 0.0055   Epoch: 15   Global Step: 190230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:44:29,401-Speed 2977.86 samples/sec   Loss 3.0551   LearningRate 0.0055   Epoch: 15   Global Step: 190240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:44:32,768-Speed 3042.00 samples/sec   Loss 3.0046   LearningRate 0.0055   Epoch: 15   Global Step: 190250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:44:36,155-Speed 3024.26 samples/sec   Loss 2.9499   LearningRate 0.0055   Epoch: 15   Global Step: 190260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:44:39,478-Speed 3081.72 samples/sec   Loss 2.9786   LearningRate 0.0055   Epoch: 15   Global Step: 190270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:44:42,922-Speed 2974.81 samples/sec   Loss 2.9701   LearningRate 0.0055   Epoch: 15   Global Step: 190280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:44:46,309-Speed 3023.89 samples/sec   Loss 3.0331   LearningRate 0.0055   Epoch: 15   Global Step: 190290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:44:49,639-Speed 3076.46 samples/sec   Loss 2.9703   LearningRate 0.0055   Epoch: 15   Global Step: 190300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:44:52,962-Speed 3082.85 samples/sec   Loss 2.8782   LearningRate 0.0055   Epoch: 15   Global Step: 190310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:44:56,354-Speed 3019.53 samples/sec   Loss 3.0404   LearningRate 0.0055   Epoch: 15   Global Step: 190320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:44:59,736-Speed 3028.45 samples/sec   Loss 3.0124   LearningRate 0.0055   Epoch: 15   Global Step: 190330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:45:03,081-Speed 3062.24 samples/sec   Loss 2.9768   LearningRate 0.0055   Epoch: 15   Global Step: 190340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:45:06,415-Speed 3071.64 samples/sec   Loss 3.0095   LearningRate 0.0055   Epoch: 15   Global Step: 190350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:45:09,785-Speed 3039.86 samples/sec   Loss 2.9752   LearningRate 0.0055   Epoch: 15   Global Step: 190360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:45:13,115-Speed 3075.40 samples/sec   Loss 3.0084   LearningRate 0.0055   Epoch: 15   Global Step: 190370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:45:16,573-Speed 2962.35 samples/sec   Loss 2.9586   LearningRate 0.0055   Epoch: 15   Global Step: 190380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:45:19,896-Speed 3082.54 samples/sec   Loss 3.0727   LearningRate 0.0055   Epoch: 15   Global Step: 190390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:45:23,239-Speed 3063.25 samples/sec   Loss 3.0048   LearningRate 0.0055   Epoch: 15   Global Step: 190400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:45:26,566-Speed 3079.36 samples/sec   Loss 2.9276   LearningRate 0.0055   Epoch: 15   Global Step: 190410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:45:29,887-Speed 3084.58 samples/sec   Loss 2.9998   LearningRate 0.0055   Epoch: 15   Global Step: 190420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:45:33,276-Speed 3022.11 samples/sec   Loss 2.9932   LearningRate 0.0054   Epoch: 15   Global Step: 190430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:45:36,729-Speed 2966.43 samples/sec   Loss 3.0077   LearningRate 0.0054   Epoch: 15   Global Step: 190440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:45:40,171-Speed 2975.35 samples/sec   Loss 2.9244   LearningRate 0.0054   Epoch: 15   Global Step: 190450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:45:43,512-Speed 3066.67 samples/sec   Loss 3.0056   LearningRate 0.0054   Epoch: 15   Global Step: 190460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:45:46,884-Speed 3036.84 samples/sec   Loss 2.9201   LearningRate 0.0054   Epoch: 15   Global Step: 190470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:45:50,267-Speed 3028.30 samples/sec   Loss 2.9782   LearningRate 0.0054   Epoch: 15   Global Step: 190480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:45:53,669-Speed 3010.46 samples/sec   Loss 2.9886   LearningRate 0.0054   Epoch: 15   Global Step: 190490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:45:57,008-Speed 3068.03 samples/sec   Loss 2.9819   LearningRate 0.0054   Epoch: 15   Global Step: 190500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:00,484-Speed 2947.19 samples/sec   Loss 3.0755   LearningRate 0.0054   Epoch: 15   Global Step: 190510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:03,831-Speed 3060.26 samples/sec   Loss 3.0420   LearningRate 0.0054   Epoch: 15   Global Step: 190520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:07,214-Speed 3027.47 samples/sec   Loss 2.9713   LearningRate 0.0054   Epoch: 15   Global Step: 190530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:10,624-Speed 3004.23 samples/sec   Loss 2.9651   LearningRate 0.0054   Epoch: 15   Global Step: 190540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:14,083-Speed 2960.83 samples/sec   Loss 2.9294   LearningRate 0.0054   Epoch: 15   Global Step: 190550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:17,536-Speed 2966.72 samples/sec   Loss 2.9136   LearningRate 0.0054   Epoch: 15   Global Step: 190560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:20,923-Speed 3024.22 samples/sec   Loss 2.9549   LearningRate 0.0054   Epoch: 15   Global Step: 190570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:24,252-Speed 3077.12 samples/sec   Loss 2.9760   LearningRate 0.0054   Epoch: 15   Global Step: 190580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:27,563-Speed 3093.41 samples/sec   Loss 3.0188   LearningRate 0.0054   Epoch: 15   Global Step: 190590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:30,944-Speed 3030.09 samples/sec   Loss 2.9719   LearningRate 0.0054   Epoch: 15   Global Step: 190600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:34,340-Speed 3015.74 samples/sec   Loss 2.9834   LearningRate 0.0054   Epoch: 15   Global Step: 190610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:37,793-Speed 2966.17 samples/sec   Loss 2.9866   LearningRate 0.0054   Epoch: 15   Global Step: 190620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:41,211-Speed 2997.21 samples/sec   Loss 2.9679   LearningRate 0.0054   Epoch: 15   Global Step: 190630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:44,597-Speed 3024.93 samples/sec   Loss 3.0506   LearningRate 0.0054   Epoch: 15   Global Step: 190640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:48,001-Speed 3008.51 samples/sec   Loss 2.9114   LearningRate 0.0054   Epoch: 15   Global Step: 190650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:51,381-Speed 3030.25 samples/sec   Loss 3.0135   LearningRate 0.0054   Epoch: 15   Global Step: 190660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:54,727-Speed 3061.92 samples/sec   Loss 2.9888   LearningRate 0.0054   Epoch: 15   Global Step: 190670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:46:58,062-Speed 3071.47 samples/sec   Loss 3.0065   LearningRate 0.0054   Epoch: 15   Global Step: 190680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:47:01,547-Speed 2939.29 samples/sec   Loss 3.0220   LearningRate 0.0054   Epoch: 15   Global Step: 190690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:47:05,041-Speed 2931.11 samples/sec   Loss 2.9956   LearningRate 0.0054   Epoch: 15   Global Step: 190700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:47:08,510-Speed 2952.46 samples/sec   Loss 2.9842   LearningRate 0.0054   Epoch: 15   Global Step: 190710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:47:11,893-Speed 3028.15 samples/sec   Loss 3.0167   LearningRate 0.0054   Epoch: 15   Global Step: 190720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:47:15,300-Speed 3006.58 samples/sec   Loss 2.9116   LearningRate 0.0054   Epoch: 15   Global Step: 190730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:47:18,702-Speed 3010.35 samples/sec   Loss 3.0478   LearningRate 0.0054   Epoch: 15   Global Step: 190740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:47:22,139-Speed 2980.58 samples/sec   Loss 2.9591   LearningRate 0.0054   Epoch: 15   Global Step: 190750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:47:25,663-Speed 2906.91 samples/sec   Loss 2.9943   LearningRate 0.0054   Epoch: 15   Global Step: 190760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:47:29,064-Speed 3011.24 samples/sec   Loss 2.9190   LearningRate 0.0054   Epoch: 15   Global Step: 190770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:47:32,492-Speed 2988.75 samples/sec   Loss 3.0649   LearningRate 0.0054   Epoch: 15   Global Step: 190780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:47:35,893-Speed 3012.97 samples/sec   Loss 2.9575   LearningRate 0.0054   Epoch: 15   Global Step: 190790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:47:39,242-Speed 3058.11 samples/sec   Loss 2.9970   LearningRate 0.0054   Epoch: 15   Global Step: 190800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:47:42,691-Speed 2969.90 samples/sec   Loss 2.9739   LearningRate 0.0054   Epoch: 15   Global Step: 190810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:47:46,046-Speed 3052.95 samples/sec   Loss 2.9692   LearningRate 0.0054   Epoch: 15   Global Step: 190820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:47:49,457-Speed 3002.64 samples/sec   Loss 3.0110   LearningRate 0.0054   Epoch: 15   Global Step: 190830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:47:52,864-Speed 3006.56 samples/sec   Loss 2.9771   LearningRate 0.0054   Epoch: 15   Global Step: 190840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:47:56,253-Speed 3022.54 samples/sec   Loss 3.0407   LearningRate 0.0054   Epoch: 15   Global Step: 190850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:47:59,613-Speed 3048.69 samples/sec   Loss 2.9778   LearningRate 0.0054   Epoch: 15   Global Step: 190860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:48:02,965-Speed 3055.84 samples/sec   Loss 2.9962   LearningRate 0.0054   Epoch: 15   Global Step: 190870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:48:06,263-Speed 3105.98 samples/sec   Loss 3.0443   LearningRate 0.0054   Epoch: 15   Global Step: 190880   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:48:09,647-Speed 3026.97 samples/sec   Loss 3.0605   LearningRate 0.0054   Epoch: 15   Global Step: 190890   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:48:13,008-Speed 3047.18 samples/sec   Loss 3.0001   LearningRate 0.0054   Epoch: 15   Global Step: 190900   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:48:16,457-Speed 2970.46 samples/sec   Loss 2.9728   LearningRate 0.0054   Epoch: 15   Global Step: 190910   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:48:19,898-Speed 2976.82 samples/sec   Loss 2.9985   LearningRate 0.0054   Epoch: 15   Global Step: 190920   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:48:23,244-Speed 3061.28 samples/sec   Loss 2.9982   LearningRate 0.0054   Epoch: 15   Global Step: 190930   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:48:26,651-Speed 3006.22 samples/sec   Loss 3.0466   LearningRate 0.0054   Epoch: 15   Global Step: 190940   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:48:30,059-Speed 3005.16 samples/sec   Loss 3.0637   LearningRate 0.0054   Epoch: 15   Global Step: 190950   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:48:33,464-Speed 3008.90 samples/sec   Loss 3.0313   LearningRate 0.0054   Epoch: 15   Global Step: 190960   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:48:36,791-Speed 3077.90 samples/sec   Loss 2.9899   LearningRate 0.0053   Epoch: 15   Global Step: 190970   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:48:40,202-Speed 3003.80 samples/sec   Loss 3.0228   LearningRate 0.0053   Epoch: 15   Global Step: 190980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:48:43,532-Speed 3075.87 samples/sec   Loss 3.0532   LearningRate 0.0053   Epoch: 15   Global Step: 190990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:48:47,003-Speed 2951.05 samples/sec   Loss 3.0496   LearningRate 0.0053   Epoch: 15   Global Step: 191000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:48:50,412-Speed 3004.26 samples/sec   Loss 3.0359   LearningRate 0.0053   Epoch: 15   Global Step: 191010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:48:53,877-Speed 2956.06 samples/sec   Loss 2.9545   LearningRate 0.0053   Epoch: 15   Global Step: 191020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:48:57,247-Speed 3039.63 samples/sec   Loss 3.0905   LearningRate 0.0053   Epoch: 15   Global Step: 191030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:49:00,677-Speed 2986.48 samples/sec   Loss 2.9925   LearningRate 0.0053   Epoch: 15   Global Step: 191040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:49:04,174-Speed 2929.17 samples/sec   Loss 3.0305   LearningRate 0.0053   Epoch: 15   Global Step: 191050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:49:07,540-Speed 3043.30 samples/sec   Loss 3.0276   LearningRate 0.0053   Epoch: 15   Global Step: 191060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:49:10,897-Speed 3051.01 samples/sec   Loss 3.0170   LearningRate 0.0053   Epoch: 15   Global Step: 191070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:49:14,285-Speed 3023.14 samples/sec   Loss 3.0439   LearningRate 0.0053   Epoch: 15   Global Step: 191080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:49:17,688-Speed 3010.56 samples/sec   Loss 2.9754   LearningRate 0.0053   Epoch: 15   Global Step: 191090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:49:21,013-Speed 3080.43 samples/sec   Loss 3.0194   LearningRate 0.0053   Epoch: 15   Global Step: 191100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:49:24,365-Speed 3055.24 samples/sec   Loss 2.9645   LearningRate 0.0053   Epoch: 15   Global Step: 191110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:49:27,725-Speed 3048.38 samples/sec   Loss 3.0682   LearningRate 0.0053   Epoch: 15   Global Step: 191120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:49:31,087-Speed 3047.43 samples/sec   Loss 2.9263   LearningRate 0.0053   Epoch: 15   Global Step: 191130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:49:34,392-Speed 3098.74 samples/sec   Loss 3.0530   LearningRate 0.0053   Epoch: 15   Global Step: 191140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:49:37,774-Speed 3029.10 samples/sec   Loss 3.0142   LearningRate 0.0053   Epoch: 15   Global Step: 191150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:49:41,210-Speed 2980.87 samples/sec   Loss 2.9478   LearningRate 0.0053   Epoch: 15   Global Step: 191160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:49:44,701-Speed 2934.16 samples/sec   Loss 3.0324   LearningRate 0.0053   Epoch: 15   Global Step: 191170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:49:48,185-Speed 2939.65 samples/sec   Loss 2.9663   LearningRate 0.0053   Epoch: 15   Global Step: 191180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:49:51,549-Speed 3044.97 samples/sec   Loss 3.0244   LearningRate 0.0053   Epoch: 15   Global Step: 191190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:49:54,938-Speed 3022.26 samples/sec   Loss 3.0559   LearningRate 0.0053   Epoch: 15   Global Step: 191200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:49:59,144-Speed 2435.43 samples/sec   Loss 3.0265   LearningRate 0.0053   Epoch: 15   Global Step: 191210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:50:02,497-Speed 3055.72 samples/sec   Loss 2.9661   LearningRate 0.0053   Epoch: 15   Global Step: 191220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:50:05,868-Speed 3038.55 samples/sec   Loss 3.0781   LearningRate 0.0053   Epoch: 15   Global Step: 191230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:50:09,335-Speed 2954.18 samples/sec   Loss 2.9913   LearningRate 0.0053   Epoch: 15   Global Step: 191240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:50:12,764-Speed 2987.62 samples/sec   Loss 3.0742   LearningRate 0.0053   Epoch: 15   Global Step: 191250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:50:16,195-Speed 2985.23 samples/sec   Loss 3.0097   LearningRate 0.0053   Epoch: 15   Global Step: 191260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:50:19,559-Speed 3044.82 samples/sec   Loss 2.9464   LearningRate 0.0053   Epoch: 15   Global Step: 191270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:50:22,884-Speed 3080.51 samples/sec   Loss 3.0275   LearningRate 0.0053   Epoch: 15   Global Step: 191280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:50:26,254-Speed 3039.31 samples/sec   Loss 3.0736   LearningRate 0.0053   Epoch: 15   Global Step: 191290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:50:29,618-Speed 3045.36 samples/sec   Loss 3.0009   LearningRate 0.0053   Epoch: 15   Global Step: 191300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:50:33,059-Speed 2977.19 samples/sec   Loss 3.0467   LearningRate 0.0053   Epoch: 15   Global Step: 191310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:50:36,475-Speed 2998.10 samples/sec   Loss 3.0409   LearningRate 0.0053   Epoch: 15   Global Step: 191320   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:50:39,812-Speed 3070.05 samples/sec   Loss 2.9776   LearningRate 0.0053   Epoch: 15   Global Step: 191330   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:50:43,143-Speed 3074.53 samples/sec   Loss 3.0175   LearningRate 0.0053   Epoch: 15   Global Step: 191340   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:50:46,511-Speed 3041.17 samples/sec   Loss 3.0180   LearningRate 0.0053   Epoch: 15   Global Step: 191350   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:50:49,873-Speed 3047.22 samples/sec   Loss 3.0336   LearningRate 0.0053   Epoch: 15   Global Step: 191360   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:50:53,327-Speed 2965.65 samples/sec   Loss 3.1276   LearningRate 0.0053   Epoch: 15   Global Step: 191370   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:50:56,706-Speed 3030.88 samples/sec   Loss 2.9845   LearningRate 0.0053   Epoch: 15   Global Step: 191380   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:51:00,127-Speed 2993.80 samples/sec   Loss 2.9174   LearningRate 0.0053   Epoch: 15   Global Step: 191390   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:51:03,532-Speed 3008.57 samples/sec   Loss 3.0175   LearningRate 0.0053   Epoch: 15   Global Step: 191400   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:51:06,922-Speed 3021.29 samples/sec   Loss 2.9882   LearningRate 0.0053   Epoch: 15   Global Step: 191410   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:51:10,291-Speed 3040.12 samples/sec   Loss 3.0189   LearningRate 0.0053   Epoch: 15   Global Step: 191420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:51:13,645-Speed 3053.99 samples/sec   Loss 3.0341   LearningRate 0.0053   Epoch: 15   Global Step: 191430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:51:17,034-Speed 3023.15 samples/sec   Loss 2.9875   LearningRate 0.0053   Epoch: 15   Global Step: 191440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:51:20,384-Speed 3057.02 samples/sec   Loss 3.0257   LearningRate 0.0053   Epoch: 15   Global Step: 191450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:51:23,746-Speed 3046.79 samples/sec   Loss 3.0030   LearningRate 0.0053   Epoch: 15   Global Step: 191460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:51:27,085-Speed 3068.23 samples/sec   Loss 2.9577   LearningRate 0.0053   Epoch: 15   Global Step: 191470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:51:30,407-Speed 3082.44 samples/sec   Loss 2.9923   LearningRate 0.0053   Epoch: 15   Global Step: 191480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:51:33,754-Speed 3060.96 samples/sec   Loss 3.0452   LearningRate 0.0053   Epoch: 15   Global Step: 191490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:51:37,122-Speed 3040.48 samples/sec   Loss 3.0444   LearningRate 0.0052   Epoch: 15   Global Step: 191500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:51:40,529-Speed 3006.47 samples/sec   Loss 2.9829   LearningRate 0.0052   Epoch: 15   Global Step: 191510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:51:43,987-Speed 2963.07 samples/sec   Loss 3.0401   LearningRate 0.0052   Epoch: 15   Global Step: 191520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:51:47,395-Speed 3005.09 samples/sec   Loss 2.9759   LearningRate 0.0052   Epoch: 15   Global Step: 191530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:51:50,733-Speed 3069.03 samples/sec   Loss 3.1009   LearningRate 0.0052   Epoch: 15   Global Step: 191540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:51:54,046-Speed 3091.13 samples/sec   Loss 3.0728   LearningRate 0.0052   Epoch: 15   Global Step: 191550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:51:57,463-Speed 2997.84 samples/sec   Loss 3.0059   LearningRate 0.0052   Epoch: 15   Global Step: 191560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:52:00,828-Speed 3044.30 samples/sec   Loss 2.9901   LearningRate 0.0052   Epoch: 15   Global Step: 191570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:52:04,249-Speed 2994.10 samples/sec   Loss 3.0588   LearningRate 0.0052   Epoch: 15   Global Step: 191580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:52:07,680-Speed 2985.28 samples/sec   Loss 2.9605   LearningRate 0.0052   Epoch: 15   Global Step: 191590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:52:11,031-Speed 3056.96 samples/sec   Loss 3.0614   LearningRate 0.0052   Epoch: 15   Global Step: 191600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:52:14,444-Speed 3001.30 samples/sec   Loss 3.0507   LearningRate 0.0052   Epoch: 15   Global Step: 191610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:52:17,758-Speed 3091.34 samples/sec   Loss 3.0139   LearningRate 0.0052   Epoch: 15   Global Step: 191620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:52:21,072-Speed 3090.25 samples/sec   Loss 3.0942   LearningRate 0.0052   Epoch: 15   Global Step: 191630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:52:24,463-Speed 3020.79 samples/sec   Loss 2.9298   LearningRate 0.0052   Epoch: 15   Global Step: 191640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:52:27,835-Speed 3037.53 samples/sec   Loss 3.0190   LearningRate 0.0052   Epoch: 15   Global Step: 191650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:52:31,235-Speed 3013.12 samples/sec   Loss 2.9957   LearningRate 0.0052   Epoch: 15   Global Step: 191660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:52:34,585-Speed 3058.13 samples/sec   Loss 3.0112   LearningRate 0.0052   Epoch: 15   Global Step: 191670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:52:37,926-Speed 3065.49 samples/sec   Loss 3.0935   LearningRate 0.0052   Epoch: 15   Global Step: 191680   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:52:41,368-Speed 2975.45 samples/sec   Loss 3.0393   LearningRate 0.0052   Epoch: 15   Global Step: 191690   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:52:44,780-Speed 3001.86 samples/sec   Loss 3.0798   LearningRate 0.0052   Epoch: 15   Global Step: 191700   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:52:48,211-Speed 2985.27 samples/sec   Loss 3.0830   LearningRate 0.0052   Epoch: 15   Global Step: 191710   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:52:51,683-Speed 2951.15 samples/sec   Loss 3.0279   LearningRate 0.0052   Epoch: 15   Global Step: 191720   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:52:55,117-Speed 2982.61 samples/sec   Loss 3.0468   LearningRate 0.0052   Epoch: 15   Global Step: 191730   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:52:58,616-Speed 2927.04 samples/sec   Loss 3.0368   LearningRate 0.0052   Epoch: 15   Global Step: 191740   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:53:02,064-Speed 2971.08 samples/sec   Loss 3.0592   LearningRate 0.0052   Epoch: 15   Global Step: 191750   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:53:05,415-Speed 3056.74 samples/sec   Loss 2.9972   LearningRate 0.0052   Epoch: 15   Global Step: 191760   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:53:08,886-Speed 2951.09 samples/sec   Loss 2.9904   LearningRate 0.0052   Epoch: 15   Global Step: 191770   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 19:53:12,347-Speed 2958.85 samples/sec   Loss 3.0193   LearningRate 0.0052   Epoch: 15   Global Step: 191780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:53:15,755-Speed 3005.75 samples/sec   Loss 3.0416   LearningRate 0.0052   Epoch: 15   Global Step: 191790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:53:19,171-Speed 2998.13 samples/sec   Loss 3.0199   LearningRate 0.0052   Epoch: 15   Global Step: 191800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:53:22,595-Speed 2992.19 samples/sec   Loss 3.0606   LearningRate 0.0052   Epoch: 15   Global Step: 191810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:53:26,009-Speed 3000.27 samples/sec   Loss 3.0470   LearningRate 0.0052   Epoch: 15   Global Step: 191820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:53:29,523-Speed 2914.94 samples/sec   Loss 3.0476   LearningRate 0.0052   Epoch: 15   Global Step: 191830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:53:32,895-Speed 3037.61 samples/sec   Loss 3.0326   LearningRate 0.0052   Epoch: 15   Global Step: 191840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:53:36,320-Speed 2989.67 samples/sec   Loss 3.0165   LearningRate 0.0052   Epoch: 15   Global Step: 191850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:53:39,758-Speed 2979.70 samples/sec   Loss 3.0360   LearningRate 0.0052   Epoch: 15   Global Step: 191860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:53:43,176-Speed 2996.74 samples/sec   Loss 2.9880   LearningRate 0.0052   Epoch: 15   Global Step: 191870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:53:46,510-Speed 3072.60 samples/sec   Loss 3.0624   LearningRate 0.0052   Epoch: 15   Global Step: 191880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:53:49,986-Speed 2946.71 samples/sec   Loss 3.0286   LearningRate 0.0052   Epoch: 15   Global Step: 191890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:53:53,411-Speed 2990.94 samples/sec   Loss 3.1164   LearningRate 0.0052   Epoch: 15   Global Step: 191900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:53:56,920-Speed 2918.90 samples/sec   Loss 3.0896   LearningRate 0.0052   Epoch: 15   Global Step: 191910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:54:00,277-Speed 3051.34 samples/sec   Loss 2.9910   LearningRate 0.0052   Epoch: 15   Global Step: 191920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:54:03,587-Speed 3093.95 samples/sec   Loss 3.0110   LearningRate 0.0052   Epoch: 15   Global Step: 191930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:54:06,897-Speed 3094.98 samples/sec   Loss 2.9898   LearningRate 0.0052   Epoch: 15   Global Step: 191940   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:54:10,241-Speed 3062.56 samples/sec   Loss 3.0563   LearningRate 0.0052   Epoch: 15   Global Step: 191950   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:54:13,637-Speed 3016.60 samples/sec   Loss 3.0330   LearningRate 0.0052   Epoch: 15   Global Step: 191960   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:54:17,055-Speed 2996.49 samples/sec   Loss 2.9994   LearningRate 0.0052   Epoch: 15   Global Step: 191970   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:54:20,492-Speed 2980.07 samples/sec   Loss 2.9689   LearningRate 0.0052   Epoch: 15   Global Step: 191980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:54:23,924-Speed 2985.16 samples/sec   Loss 2.9781   LearningRate 0.0052   Epoch: 15   Global Step: 191990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:54:27,285-Speed 3047.65 samples/sec   Loss 3.0541   LearningRate 0.0052   Epoch: 15   Global Step: 192000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:54:30,714-Speed 2987.48 samples/sec   Loss 3.0008   LearningRate 0.0052   Epoch: 15   Global Step: 192010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:54:34,149-Speed 2981.90 samples/sec   Loss 2.9227   LearningRate 0.0052   Epoch: 15   Global Step: 192020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:54:37,597-Speed 2969.94 samples/sec   Loss 3.0968   LearningRate 0.0052   Epoch: 15   Global Step: 192030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:54:40,967-Speed 3039.77 samples/sec   Loss 2.9320   LearningRate 0.0052   Epoch: 15   Global Step: 192040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:54:44,301-Speed 3072.43 samples/sec   Loss 3.0049   LearningRate 0.0051   Epoch: 15   Global Step: 192050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:54:47,641-Speed 3066.97 samples/sec   Loss 3.0505   LearningRate 0.0051   Epoch: 15   Global Step: 192060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:54:51,076-Speed 2981.45 samples/sec   Loss 3.0534   LearningRate 0.0051   Epoch: 15   Global Step: 192070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:54:54,542-Speed 2955.22 samples/sec   Loss 3.0302   LearningRate 0.0051   Epoch: 15   Global Step: 192080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:54:57,939-Speed 3014.89 samples/sec   Loss 2.9983   LearningRate 0.0051   Epoch: 15   Global Step: 192090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:55:01,306-Speed 3043.21 samples/sec   Loss 3.0823   LearningRate 0.0051   Epoch: 15   Global Step: 192100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:55:04,654-Speed 3059.02 samples/sec   Loss 3.0030   LearningRate 0.0051   Epoch: 15   Global Step: 192110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:55:08,034-Speed 3030.36 samples/sec   Loss 3.0983   LearningRate 0.0051   Epoch: 15   Global Step: 192120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:55:11,424-Speed 3021.13 samples/sec   Loss 3.0340   LearningRate 0.0051   Epoch: 15   Global Step: 192130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:55:14,857-Speed 2984.59 samples/sec   Loss 3.0781   LearningRate 0.0051   Epoch: 15   Global Step: 192140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:55:18,251-Speed 3017.89 samples/sec   Loss 3.0786   LearningRate 0.0051   Epoch: 15   Global Step: 192150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:55:21,663-Speed 3001.61 samples/sec   Loss 3.0045   LearningRate 0.0051   Epoch: 15   Global Step: 192160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:55:25,059-Speed 3016.15 samples/sec   Loss 3.0162   LearningRate 0.0051   Epoch: 15   Global Step: 192170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:55:28,406-Speed 3060.22 samples/sec   Loss 3.0848   LearningRate 0.0051   Epoch: 15   Global Step: 192180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:55:31,759-Speed 3055.25 samples/sec   Loss 2.9879   LearningRate 0.0051   Epoch: 15   Global Step: 192190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:55:35,162-Speed 3009.48 samples/sec   Loss 3.0542   LearningRate 0.0051   Epoch: 15   Global Step: 192200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:55:38,572-Speed 3003.92 samples/sec   Loss 2.9822   LearningRate 0.0051   Epoch: 15   Global Step: 192210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:55:41,921-Speed 3058.39 samples/sec   Loss 3.0286   LearningRate 0.0051   Epoch: 15   Global Step: 192220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:55:45,382-Speed 2960.63 samples/sec   Loss 3.0462   LearningRate 0.0051   Epoch: 15   Global Step: 192230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:55:48,792-Speed 3003.48 samples/sec   Loss 3.1283   LearningRate 0.0051   Epoch: 15   Global Step: 192240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:55:52,122-Speed 3076.19 samples/sec   Loss 3.0153   LearningRate 0.0051   Epoch: 15   Global Step: 192250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:55:55,515-Speed 3018.85 samples/sec   Loss 3.0730   LearningRate 0.0051   Epoch: 15   Global Step: 192260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:55:58,843-Speed 3078.10 samples/sec   Loss 3.1022   LearningRate 0.0051   Epoch: 15   Global Step: 192270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:56:02,216-Speed 3036.69 samples/sec   Loss 3.0392   LearningRate 0.0051   Epoch: 15   Global Step: 192280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:56:05,583-Speed 3042.32 samples/sec   Loss 3.0980   LearningRate 0.0051   Epoch: 15   Global Step: 192290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:56:08,962-Speed 3030.70 samples/sec   Loss 3.0339   LearningRate 0.0051   Epoch: 15   Global Step: 192300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:56:12,295-Speed 3073.53 samples/sec   Loss 3.1058   LearningRate 0.0051   Epoch: 15   Global Step: 192310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:56:15,669-Speed 3036.41 samples/sec   Loss 3.0169   LearningRate 0.0051   Epoch: 15   Global Step: 192320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:56:19,045-Speed 3033.54 samples/sec   Loss 3.0526   LearningRate 0.0051   Epoch: 15   Global Step: 192330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:56:22,435-Speed 3021.26 samples/sec   Loss 3.0189   LearningRate 0.0051   Epoch: 15   Global Step: 192340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:56:25,861-Speed 2990.50 samples/sec   Loss 3.0469   LearningRate 0.0051   Epoch: 15   Global Step: 192350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:56:29,203-Speed 3064.67 samples/sec   Loss 3.0629   LearningRate 0.0051   Epoch: 15   Global Step: 192360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:56:32,610-Speed 3006.29 samples/sec   Loss 3.0367   LearningRate 0.0051   Epoch: 15   Global Step: 192370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:56:36,069-Speed 2961.20 samples/sec   Loss 3.0403   LearningRate 0.0051   Epoch: 15   Global Step: 192380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:56:39,477-Speed 3005.22 samples/sec   Loss 3.0578   LearningRate 0.0051   Epoch: 15   Global Step: 192390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:56:42,893-Speed 2998.54 samples/sec   Loss 3.0536   LearningRate 0.0051   Epoch: 15   Global Step: 192400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:56:46,373-Speed 2944.01 samples/sec   Loss 3.0679   LearningRate 0.0051   Epoch: 15   Global Step: 192410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:56:49,788-Speed 2999.60 samples/sec   Loss 3.0707   LearningRate 0.0051   Epoch: 15   Global Step: 192420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:56:53,115-Speed 3078.44 samples/sec   Loss 3.0249   LearningRate 0.0051   Epoch: 15   Global Step: 192430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:56:56,534-Speed 2995.61 samples/sec   Loss 3.1245   LearningRate 0.0051   Epoch: 15   Global Step: 192440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:56:59,964-Speed 2986.18 samples/sec   Loss 3.0840   LearningRate 0.0051   Epoch: 15   Global Step: 192450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:57:03,371-Speed 3006.78 samples/sec   Loss 3.0382   LearningRate 0.0051   Epoch: 15   Global Step: 192460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:57:06,756-Speed 3025.59 samples/sec   Loss 3.0683   LearningRate 0.0051   Epoch: 15   Global Step: 192470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:57:10,147-Speed 3021.14 samples/sec   Loss 2.9906   LearningRate 0.0051   Epoch: 15   Global Step: 192480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:57:13,592-Speed 2972.37 samples/sec   Loss 3.1347   LearningRate 0.0051   Epoch: 15   Global Step: 192490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:57:17,059-Speed 2955.15 samples/sec   Loss 3.0565   LearningRate 0.0051   Epoch: 15   Global Step: 192500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:57:20,520-Speed 2959.30 samples/sec   Loss 3.0151   LearningRate 0.0051   Epoch: 15   Global Step: 192510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:57:23,921-Speed 3012.13 samples/sec   Loss 3.0753   LearningRate 0.0051   Epoch: 15   Global Step: 192520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:57:27,378-Speed 2962.73 samples/sec   Loss 3.0211   LearningRate 0.0051   Epoch: 15   Global Step: 192530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:57:30,858-Speed 2942.56 samples/sec   Loss 2.9843   LearningRate 0.0051   Epoch: 15   Global Step: 192540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:57:34,227-Speed 3040.95 samples/sec   Loss 3.0332   LearningRate 0.0051   Epoch: 15   Global Step: 192550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:57:37,658-Speed 2985.58 samples/sec   Loss 3.0619   LearningRate 0.0051   Epoch: 15   Global Step: 192560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:57:41,026-Speed 3041.23 samples/sec   Loss 2.9449   LearningRate 0.0051   Epoch: 15   Global Step: 192570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:57:44,500-Speed 2948.65 samples/sec   Loss 3.0488   LearningRate 0.0051   Epoch: 15   Global Step: 192580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:57:47,929-Speed 2986.49 samples/sec   Loss 3.0892   LearningRate 0.0051   Epoch: 15   Global Step: 192590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:57:51,271-Speed 3065.36 samples/sec   Loss 3.0796   LearningRate 0.0050   Epoch: 15   Global Step: 192600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:57:54,608-Speed 3069.45 samples/sec   Loss 3.0392   LearningRate 0.0050   Epoch: 15   Global Step: 192610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:57:57,997-Speed 3022.18 samples/sec   Loss 3.0916   LearningRate 0.0050   Epoch: 15   Global Step: 192620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:58:01,427-Speed 2986.82 samples/sec   Loss 3.0261   LearningRate 0.0050   Epoch: 15   Global Step: 192630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:58:04,802-Speed 3034.63 samples/sec   Loss 3.0722   LearningRate 0.0050   Epoch: 15   Global Step: 192640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:58:08,217-Speed 2998.99 samples/sec   Loss 3.0544   LearningRate 0.0050   Epoch: 15   Global Step: 192650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:58:11,600-Speed 3027.90 samples/sec   Loss 3.0472   LearningRate 0.0050   Epoch: 15   Global Step: 192660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:58:14,970-Speed 3039.41 samples/sec   Loss 2.9985   LearningRate 0.0050   Epoch: 15   Global Step: 192670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:58:18,327-Speed 3051.93 samples/sec   Loss 2.9931   LearningRate 0.0050   Epoch: 15   Global Step: 192680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:58:21,720-Speed 3018.31 samples/sec   Loss 3.0812   LearningRate 0.0050   Epoch: 15   Global Step: 192690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:58:25,094-Speed 3035.88 samples/sec   Loss 3.0660   LearningRate 0.0050   Epoch: 15   Global Step: 192700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:58:28,530-Speed 2981.55 samples/sec   Loss 3.0237   LearningRate 0.0050   Epoch: 15   Global Step: 192710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:58:31,971-Speed 2978.25 samples/sec   Loss 3.0221   LearningRate 0.0050   Epoch: 15   Global Step: 192720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:58:35,333-Speed 3046.71 samples/sec   Loss 3.0140   LearningRate 0.0050   Epoch: 15   Global Step: 192730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:58:38,648-Speed 3089.95 samples/sec   Loss 3.0650   LearningRate 0.0050   Epoch: 15   Global Step: 192740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:58:42,097-Speed 2969.73 samples/sec   Loss 3.0665   LearningRate 0.0050   Epoch: 15   Global Step: 192750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:58:45,536-Speed 2978.13 samples/sec   Loss 3.0547   LearningRate 0.0050   Epoch: 15   Global Step: 192760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:58:48,890-Speed 3054.08 samples/sec   Loss 3.0459   LearningRate 0.0050   Epoch: 15   Global Step: 192770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:58:52,298-Speed 3005.43 samples/sec   Loss 3.0629   LearningRate 0.0050   Epoch: 15   Global Step: 192780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:58:55,645-Speed 3060.41 samples/sec   Loss 3.1328   LearningRate 0.0050   Epoch: 15   Global Step: 192790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:58:59,076-Speed 2985.31 samples/sec   Loss 2.9715   LearningRate 0.0050   Epoch: 15   Global Step: 192800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:59:02,473-Speed 3014.95 samples/sec   Loss 3.0742   LearningRate 0.0050   Epoch: 15   Global Step: 192810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:59:05,815-Speed 3064.68 samples/sec   Loss 3.0420   LearningRate 0.0050   Epoch: 15   Global Step: 192820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:59:09,227-Speed 3002.03 samples/sec   Loss 3.0561   LearningRate 0.0050   Epoch: 15   Global Step: 192830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:59:12,559-Speed 3074.92 samples/sec   Loss 2.9652   LearningRate 0.0050   Epoch: 15   Global Step: 192840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 19:59:15,972-Speed 3000.90 samples/sec   Loss 3.0669   LearningRate 0.0050   Epoch: 15   Global Step: 192850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:59:19,433-Speed 2959.93 samples/sec   Loss 3.0850   LearningRate 0.0050   Epoch: 15   Global Step: 192860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:59:22,786-Speed 3054.25 samples/sec   Loss 3.0980   LearningRate 0.0050   Epoch: 15   Global Step: 192870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:59:26,123-Speed 3070.12 samples/sec   Loss 3.1032   LearningRate 0.0050   Epoch: 15   Global Step: 192880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:59:29,519-Speed 3015.28 samples/sec   Loss 3.0360   LearningRate 0.0050   Epoch: 15   Global Step: 192890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:59:32,877-Speed 3050.57 samples/sec   Loss 3.0934   LearningRate 0.0050   Epoch: 15   Global Step: 192900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:59:36,223-Speed 3061.30 samples/sec   Loss 3.0075   LearningRate 0.0050   Epoch: 15   Global Step: 192910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:59:39,586-Speed 3045.43 samples/sec   Loss 3.1512   LearningRate 0.0050   Epoch: 15   Global Step: 192920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:59:43,015-Speed 2987.97 samples/sec   Loss 3.0697   LearningRate 0.0050   Epoch: 15   Global Step: 192930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:59:46,379-Speed 3044.62 samples/sec   Loss 3.1073   LearningRate 0.0050   Epoch: 15   Global Step: 192940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:59:49,736-Speed 3051.02 samples/sec   Loss 3.1100   LearningRate 0.0050   Epoch: 15   Global Step: 192950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 19:59:53,104-Speed 3040.63 samples/sec   Loss 3.0845   LearningRate 0.0050   Epoch: 15   Global Step: 192960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:59:56,454-Speed 3057.96 samples/sec   Loss 3.0215   LearningRate 0.0050   Epoch: 15   Global Step: 192970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 19:59:59,807-Speed 3054.73 samples/sec   Loss 3.1349   LearningRate 0.0050   Epoch: 15   Global Step: 192980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:00:03,248-Speed 2976.29 samples/sec   Loss 3.0668   LearningRate 0.0050   Epoch: 15   Global Step: 192990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:00:06,708-Speed 2960.57 samples/sec   Loss 3.0226   LearningRate 0.0050   Epoch: 15   Global Step: 193000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:00:10,147-Speed 2978.35 samples/sec   Loss 3.0634   LearningRate 0.0050   Epoch: 15   Global Step: 193010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:00:13,543-Speed 3017.06 samples/sec   Loss 3.0956   LearningRate 0.0050   Epoch: 15   Global Step: 193020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:00:16,933-Speed 3020.36 samples/sec   Loss 3.0241   LearningRate 0.0050   Epoch: 15   Global Step: 193030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:00:20,386-Speed 2967.35 samples/sec   Loss 3.1173   LearningRate 0.0050   Epoch: 15   Global Step: 193040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:00:23,742-Speed 3051.50 samples/sec   Loss 3.1202   LearningRate 0.0050   Epoch: 15   Global Step: 193050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:00:27,075-Speed 3073.98 samples/sec   Loss 3.0198   LearningRate 0.0050   Epoch: 15   Global Step: 193060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:00:30,504-Speed 2986.59 samples/sec   Loss 3.0563   LearningRate 0.0050   Epoch: 15   Global Step: 193070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:00:33,851-Speed 3060.78 samples/sec   Loss 3.0260   LearningRate 0.0050   Epoch: 15   Global Step: 193080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:00:37,245-Speed 3018.15 samples/sec   Loss 3.0655   LearningRate 0.0050   Epoch: 15   Global Step: 193090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:00:40,580-Speed 3071.23 samples/sec   Loss 2.9820   LearningRate 0.0050   Epoch: 15   Global Step: 193100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:00:43,950-Speed 3039.84 samples/sec   Loss 3.1218   LearningRate 0.0050   Epoch: 15   Global Step: 193110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:00:47,350-Speed 3012.63 samples/sec   Loss 3.0547   LearningRate 0.0050   Epoch: 15   Global Step: 193120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:00:50,754-Speed 3008.50 samples/sec   Loss 3.0195   LearningRate 0.0050   Epoch: 15   Global Step: 193130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:00:54,099-Speed 3061.93 samples/sec   Loss 3.0543   LearningRate 0.0050   Epoch: 15   Global Step: 193140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:00:57,524-Speed 2991.45 samples/sec   Loss 3.0829   LearningRate 0.0050   Epoch: 15   Global Step: 193150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:00,951-Speed 2988.93 samples/sec   Loss 3.0821   LearningRate 0.0049   Epoch: 15   Global Step: 193160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:04,383-Speed 2983.88 samples/sec   Loss 3.1515   LearningRate 0.0049   Epoch: 15   Global Step: 193170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:07,824-Speed 2976.72 samples/sec   Loss 3.0069   LearningRate 0.0049   Epoch: 15   Global Step: 193180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:11,172-Speed 3059.89 samples/sec   Loss 3.0307   LearningRate 0.0049   Epoch: 15   Global Step: 193190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:14,545-Speed 3036.49 samples/sec   Loss 3.1668   LearningRate 0.0049   Epoch: 15   Global Step: 193200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:17,920-Speed 3035.65 samples/sec   Loss 3.1382   LearningRate 0.0049   Epoch: 15   Global Step: 193210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:21,299-Speed 3030.98 samples/sec   Loss 3.1441   LearningRate 0.0049   Epoch: 15   Global Step: 193220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:24,756-Speed 2962.42 samples/sec   Loss 3.0239   LearningRate 0.0049   Epoch: 15   Global Step: 193230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:28,258-Speed 2925.50 samples/sec   Loss 3.0454   LearningRate 0.0049   Epoch: 15   Global Step: 193240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:31,747-Speed 2935.62 samples/sec   Loss 3.0953   LearningRate 0.0049   Epoch: 15   Global Step: 193250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:35,151-Speed 3008.72 samples/sec   Loss 3.0206   LearningRate 0.0049   Epoch: 15   Global Step: 193260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:38,641-Speed 2935.39 samples/sec   Loss 3.1097   LearningRate 0.0049   Epoch: 15   Global Step: 193270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:42,051-Speed 3003.43 samples/sec   Loss 3.0866   LearningRate 0.0049   Epoch: 15   Global Step: 193280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:45,470-Speed 2995.72 samples/sec   Loss 3.0100   LearningRate 0.0049   Epoch: 15   Global Step: 193290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:48,879-Speed 3004.32 samples/sec   Loss 3.1259   LearningRate 0.0049   Epoch: 15   Global Step: 193300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:52,329-Speed 2969.32 samples/sec   Loss 3.0313   LearningRate 0.0049   Epoch: 15   Global Step: 193310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:55,682-Speed 3055.48 samples/sec   Loss 3.0689   LearningRate 0.0049   Epoch: 15   Global Step: 193320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:01:59,149-Speed 2953.77 samples/sec   Loss 3.1302   LearningRate 0.0049   Epoch: 15   Global Step: 193330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:02:02,545-Speed 3016.61 samples/sec   Loss 3.0870   LearningRate 0.0049   Epoch: 15   Global Step: 193340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:02:05,952-Speed 3006.25 samples/sec   Loss 3.0590   LearningRate 0.0049   Epoch: 15   Global Step: 193350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:02:09,358-Speed 3008.02 samples/sec   Loss 3.0737   LearningRate 0.0049   Epoch: 15   Global Step: 193360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:02:12,765-Speed 3006.03 samples/sec   Loss 3.1166   LearningRate 0.0049   Epoch: 15   Global Step: 193370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:02:16,275-Speed 2918.57 samples/sec   Loss 3.0612   LearningRate 0.0049   Epoch: 15   Global Step: 193380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:02:19,633-Speed 3050.59 samples/sec   Loss 3.1057   LearningRate 0.0049   Epoch: 15   Global Step: 193390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:02:23,016-Speed 3026.97 samples/sec   Loss 3.0449   LearningRate 0.0049   Epoch: 15   Global Step: 193400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:02:26,381-Speed 3044.06 samples/sec   Loss 3.0199   LearningRate 0.0049   Epoch: 15   Global Step: 193410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:02:29,765-Speed 3027.32 samples/sec   Loss 3.0972   LearningRate 0.0049   Epoch: 15   Global Step: 193420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:02:33,121-Speed 3051.39 samples/sec   Loss 3.1155   LearningRate 0.0049   Epoch: 15   Global Step: 193430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:02:36,524-Speed 3010.03 samples/sec   Loss 3.0845   LearningRate 0.0049   Epoch: 15   Global Step: 193440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:02:39,855-Speed 3075.91 samples/sec   Loss 2.9705   LearningRate 0.0049   Epoch: 15   Global Step: 193450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:02:43,171-Speed 3088.77 samples/sec   Loss 3.0599   LearningRate 0.0049   Epoch: 15   Global Step: 193460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:02:46,517-Speed 3060.90 samples/sec   Loss 3.0772   LearningRate 0.0049   Epoch: 15   Global Step: 193470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:02:49,967-Speed 2968.50 samples/sec   Loss 3.0786   LearningRate 0.0049   Epoch: 15   Global Step: 193480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:02:53,358-Speed 3020.56 samples/sec   Loss 3.0682   LearningRate 0.0049   Epoch: 15   Global Step: 193490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:02:56,745-Speed 3024.88 samples/sec   Loss 3.0883   LearningRate 0.0049   Epoch: 15   Global Step: 193500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:03:00,155-Speed 3003.53 samples/sec   Loss 3.0897   LearningRate 0.0049   Epoch: 15   Global Step: 193510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:03:03,580-Speed 2990.57 samples/sec   Loss 3.0910   LearningRate 0.0049   Epoch: 15   Global Step: 193520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:03:06,970-Speed 3020.81 samples/sec   Loss 3.0338   LearningRate 0.0049   Epoch: 15   Global Step: 193530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:03:10,362-Speed 3020.11 samples/sec   Loss 3.0939   LearningRate 0.0049   Epoch: 15   Global Step: 193540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:03:13,763-Speed 3012.10 samples/sec   Loss 3.1034   LearningRate 0.0049   Epoch: 15   Global Step: 193550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:03:17,109-Speed 3060.58 samples/sec   Loss 3.1676   LearningRate 0.0049   Epoch: 15   Global Step: 193560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:03:20,498-Speed 3022.48 samples/sec   Loss 3.0733   LearningRate 0.0049   Epoch: 15   Global Step: 193570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:03:23,865-Speed 3041.63 samples/sec   Loss 3.1187   LearningRate 0.0049   Epoch: 15   Global Step: 193580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:03:27,254-Speed 3022.44 samples/sec   Loss 3.0340   LearningRate 0.0049   Epoch: 15   Global Step: 193590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:03:30,628-Speed 3036.04 samples/sec   Loss 3.0906   LearningRate 0.0049   Epoch: 15   Global Step: 193600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:03:33,974-Speed 3061.40 samples/sec   Loss 3.0341   LearningRate 0.0049   Epoch: 15   Global Step: 193610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:03:37,411-Speed 2979.94 samples/sec   Loss 3.0687   LearningRate 0.0049   Epoch: 15   Global Step: 193620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:03:40,791-Speed 3030.73 samples/sec   Loss 3.1256   LearningRate 0.0049   Epoch: 15   Global Step: 193630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:03:44,202-Speed 3002.77 samples/sec   Loss 3.1272   LearningRate 0.0049   Epoch: 15   Global Step: 193640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:03:47,565-Speed 3045.66 samples/sec   Loss 3.0699   LearningRate 0.0049   Epoch: 15   Global Step: 193650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:03:50,981-Speed 2998.71 samples/sec   Loss 3.0811   LearningRate 0.0049   Epoch: 15   Global Step: 193660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:03:54,372-Speed 3020.35 samples/sec   Loss 3.0445   LearningRate 0.0049   Epoch: 15   Global Step: 193670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:03:57,751-Speed 3032.06 samples/sec   Loss 3.0202   LearningRate 0.0049   Epoch: 15   Global Step: 193680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:04:01,141-Speed 3020.73 samples/sec   Loss 3.1394   LearningRate 0.0049   Epoch: 15   Global Step: 193690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:04:04,566-Speed 2990.64 samples/sec   Loss 3.0377   LearningRate 0.0049   Epoch: 15   Global Step: 193700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:04:07,932-Speed 3042.92 samples/sec   Loss 3.0539   LearningRate 0.0049   Epoch: 15   Global Step: 193710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:04:11,367-Speed 2982.93 samples/sec   Loss 3.0371   LearningRate 0.0048   Epoch: 15   Global Step: 193720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:04:14,750-Speed 3027.52 samples/sec   Loss 3.0474   LearningRate 0.0048   Epoch: 15   Global Step: 193730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:04:18,116-Speed 3042.44 samples/sec   Loss 3.0644   LearningRate 0.0048   Epoch: 15   Global Step: 193740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:04:21,444-Speed 3078.54 samples/sec   Loss 2.9837   LearningRate 0.0048   Epoch: 15   Global Step: 193750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:04:24,841-Speed 3014.85 samples/sec   Loss 3.0169   LearningRate 0.0048   Epoch: 15   Global Step: 193760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:04:28,253-Speed 3002.34 samples/sec   Loss 3.0376   LearningRate 0.0048   Epoch: 15   Global Step: 193770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:04:31,648-Speed 3017.37 samples/sec   Loss 2.9976   LearningRate 0.0048   Epoch: 15   Global Step: 193780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:04:35,028-Speed 3029.89 samples/sec   Loss 3.0397   LearningRate 0.0048   Epoch: 15   Global Step: 193790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:04:38,390-Speed 3046.95 samples/sec   Loss 2.9960   LearningRate 0.0048   Epoch: 15   Global Step: 193800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:04:41,783-Speed 3019.16 samples/sec   Loss 3.1239   LearningRate 0.0048   Epoch: 15   Global Step: 193810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:04:45,096-Speed 3091.00 samples/sec   Loss 3.1083   LearningRate 0.0048   Epoch: 15   Global Step: 193820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:04:48,444-Speed 3059.61 samples/sec   Loss 3.0137   LearningRate 0.0048   Epoch: 15   Global Step: 193830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:04:51,920-Speed 2946.43 samples/sec   Loss 3.1282   LearningRate 0.0048   Epoch: 15   Global Step: 193840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:04:55,346-Speed 2990.31 samples/sec   Loss 3.0707   LearningRate 0.0048   Epoch: 15   Global Step: 193850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:04:58,762-Speed 2997.67 samples/sec   Loss 3.0334   LearningRate 0.0048   Epoch: 15   Global Step: 193860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:05:02,190-Speed 2988.88 samples/sec   Loss 3.0449   LearningRate 0.0048   Epoch: 15   Global Step: 193870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:05:05,516-Speed 3079.40 samples/sec   Loss 3.1045   LearningRate 0.0048   Epoch: 15   Global Step: 193880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:05:08,903-Speed 3024.45 samples/sec   Loss 3.0389   LearningRate 0.0048   Epoch: 15   Global Step: 193890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:05:12,275-Speed 3037.91 samples/sec   Loss 3.0383   LearningRate 0.0048   Epoch: 15   Global Step: 193900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:05:15,597-Speed 3082.81 samples/sec   Loss 3.0447   LearningRate 0.0048   Epoch: 15   Global Step: 193910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:05:19,026-Speed 2986.80 samples/sec   Loss 3.0983   LearningRate 0.0048   Epoch: 15   Global Step: 193920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:05:22,368-Speed 3064.94 samples/sec   Loss 3.0496   LearningRate 0.0048   Epoch: 15   Global Step: 193930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:05:25,709-Speed 3065.91 samples/sec   Loss 3.1423   LearningRate 0.0048   Epoch: 15   Global Step: 193940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:05:29,058-Speed 3058.27 samples/sec   Loss 3.0991   LearningRate 0.0048   Epoch: 15   Global Step: 193950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:05:32,414-Speed 3052.29 samples/sec   Loss 3.0875   LearningRate 0.0048   Epoch: 15   Global Step: 193960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:05:35,784-Speed 3040.26 samples/sec   Loss 3.0986   LearningRate 0.0048   Epoch: 15   Global Step: 193970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:05:39,126-Speed 3064.66 samples/sec   Loss 3.0354   LearningRate 0.0048   Epoch: 15   Global Step: 193980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:05:42,498-Speed 3037.22 samples/sec   Loss 3.0750   LearningRate 0.0048   Epoch: 15   Global Step: 193990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:05:45,966-Speed 2953.66 samples/sec   Loss 3.1062   LearningRate 0.0048   Epoch: 15   Global Step: 194000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:05:49,498-Speed 2900.36 samples/sec   Loss 2.9639   LearningRate 0.0048   Epoch: 15   Global Step: 194010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:05:52,848-Speed 3057.94 samples/sec   Loss 3.0657   LearningRate 0.0048   Epoch: 15   Global Step: 194020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:05:56,240-Speed 3019.74 samples/sec   Loss 3.0933   LearningRate 0.0048   Epoch: 15   Global Step: 194030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:05:59,574-Speed 3072.18 samples/sec   Loss 3.0793   LearningRate 0.0048   Epoch: 15   Global Step: 194040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:06:03,054-Speed 2944.30 samples/sec   Loss 2.9928   LearningRate 0.0048   Epoch: 15   Global Step: 194050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:06:06,417-Speed 3045.73 samples/sec   Loss 3.0845   LearningRate 0.0048   Epoch: 15   Global Step: 194060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:06:09,876-Speed 2961.23 samples/sec   Loss 3.0587   LearningRate 0.0048   Epoch: 15   Global Step: 194070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:06:13,248-Speed 3037.94 samples/sec   Loss 3.0553   LearningRate 0.0048   Epoch: 15   Global Step: 194080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:06:16,634-Speed 3024.98 samples/sec   Loss 3.0666   LearningRate 0.0048   Epoch: 15   Global Step: 194090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:06:19,968-Speed 3072.21 samples/sec   Loss 3.0724   LearningRate 0.0048   Epoch: 15   Global Step: 194100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:06:23,436-Speed 2954.01 samples/sec   Loss 3.0769   LearningRate 0.0048   Epoch: 15   Global Step: 194110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:06:26,753-Speed 3087.65 samples/sec   Loss 3.0581   LearningRate 0.0048   Epoch: 15   Global Step: 194120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:06:30,173-Speed 2994.80 samples/sec   Loss 3.1162   LearningRate 0.0048   Epoch: 15   Global Step: 194130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:06:33,527-Speed 3054.26 samples/sec   Loss 3.1077   LearningRate 0.0048   Epoch: 15   Global Step: 194140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:06:36,901-Speed 3036.03 samples/sec   Loss 3.0720   LearningRate 0.0048   Epoch: 15   Global Step: 194150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:06:40,239-Speed 3069.00 samples/sec   Loss 3.0646   LearningRate 0.0048   Epoch: 15   Global Step: 194160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:06:43,709-Speed 2951.69 samples/sec   Loss 3.0309   LearningRate 0.0048   Epoch: 15   Global Step: 194170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:06:47,058-Speed 3058.00 samples/sec   Loss 3.0490   LearningRate 0.0048   Epoch: 15   Global Step: 194180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:06:50,553-Speed 2931.13 samples/sec   Loss 3.0464   LearningRate 0.0048   Epoch: 15   Global Step: 194190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:06:54,084-Speed 2900.97 samples/sec   Loss 2.9701   LearningRate 0.0048   Epoch: 15   Global Step: 194200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:06:57,480-Speed 3015.56 samples/sec   Loss 3.0427   LearningRate 0.0048   Epoch: 15   Global Step: 194210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:07:00,871-Speed 3021.15 samples/sec   Loss 3.1577   LearningRate 0.0048   Epoch: 15   Global Step: 194220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:07:04,195-Speed 3081.44 samples/sec   Loss 3.0767   LearningRate 0.0048   Epoch: 15   Global Step: 194230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:07:07,562-Speed 3041.67 samples/sec   Loss 3.0995   LearningRate 0.0048   Epoch: 15   Global Step: 194240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:07:10,891-Speed 3077.13 samples/sec   Loss 3.0654   LearningRate 0.0048   Epoch: 15   Global Step: 194250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:07:14,239-Speed 3059.83 samples/sec   Loss 3.1194   LearningRate 0.0048   Epoch: 15   Global Step: 194260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:07:17,596-Speed 3050.95 samples/sec   Loss 3.0589   LearningRate 0.0048   Epoch: 15   Global Step: 194270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:07:20,978-Speed 3028.85 samples/sec   Loss 3.1103   LearningRate 0.0047   Epoch: 15   Global Step: 194280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:07:24,403-Speed 2990.85 samples/sec   Loss 3.1472   LearningRate 0.0047   Epoch: 15   Global Step: 194290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:07:27,746-Speed 3063.22 samples/sec   Loss 3.0867   LearningRate 0.0047   Epoch: 15   Global Step: 194300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:07:31,126-Speed 3030.60 samples/sec   Loss 3.0965   LearningRate 0.0047   Epoch: 15   Global Step: 194310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:07:34,551-Speed 2990.60 samples/sec   Loss 3.0595   LearningRate 0.0047   Epoch: 15   Global Step: 194320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:07:37,882-Speed 3075.77 samples/sec   Loss 3.0667   LearningRate 0.0047   Epoch: 15   Global Step: 194330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:07:41,222-Speed 3065.99 samples/sec   Loss 3.0374   LearningRate 0.0047   Epoch: 15   Global Step: 194340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:07:44,586-Speed 3045.19 samples/sec   Loss 3.0730   LearningRate 0.0047   Epoch: 15   Global Step: 194350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:07:48,031-Speed 2973.18 samples/sec   Loss 3.1545   LearningRate 0.0047   Epoch: 15   Global Step: 194360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:07:51,467-Speed 2980.63 samples/sec   Loss 3.0251   LearningRate 0.0047   Epoch: 15   Global Step: 194370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:07:54,797-Speed 3076.61 samples/sec   Loss 3.0562   LearningRate 0.0047   Epoch: 15   Global Step: 194380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:07:58,141-Speed 3063.23 samples/sec   Loss 2.9823   LearningRate 0.0047   Epoch: 15   Global Step: 194390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:08:01,502-Speed 3047.59 samples/sec   Loss 2.9896   LearningRate 0.0047   Epoch: 15   Global Step: 194400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:08:04,865-Speed 3045.94 samples/sec   Loss 3.1502   LearningRate 0.0047   Epoch: 15   Global Step: 194410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:08:08,285-Speed 2995.49 samples/sec   Loss 3.0040   LearningRate 0.0047   Epoch: 15   Global Step: 194420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:08:11,757-Speed 2949.79 samples/sec   Loss 3.0692   LearningRate 0.0047   Epoch: 15   Global Step: 194430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:08:15,130-Speed 3036.36 samples/sec   Loss 3.0873   LearningRate 0.0047   Epoch: 15   Global Step: 194440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:08:18,514-Speed 3027.10 samples/sec   Loss 3.0587   LearningRate 0.0047   Epoch: 15   Global Step: 194450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:08:21,892-Speed 3032.99 samples/sec   Loss 3.0908   LearningRate 0.0047   Epoch: 15   Global Step: 194460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:08:25,273-Speed 3029.54 samples/sec   Loss 3.1322   LearningRate 0.0047   Epoch: 15   Global Step: 194470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:08:28,588-Speed 3089.53 samples/sec   Loss 3.1222   LearningRate 0.0047   Epoch: 15   Global Step: 194480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:08:31,975-Speed 3024.72 samples/sec   Loss 3.1020   LearningRate 0.0047   Epoch: 15   Global Step: 194490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:08:35,433-Speed 2962.78 samples/sec   Loss 3.0603   LearningRate 0.0047   Epoch: 15   Global Step: 194500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:08:38,945-Speed 2916.79 samples/sec   Loss 3.1171   LearningRate 0.0047   Epoch: 15   Global Step: 194510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:08:42,272-Speed 3078.46 samples/sec   Loss 3.0712   LearningRate 0.0047   Epoch: 15   Global Step: 194520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:08:45,729-Speed 2963.18 samples/sec   Loss 3.0529   LearningRate 0.0047   Epoch: 15   Global Step: 194530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:08:49,155-Speed 2988.91 samples/sec   Loss 3.1230   LearningRate 0.0047   Epoch: 15   Global Step: 194540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:08:52,544-Speed 3022.81 samples/sec   Loss 3.0955   LearningRate 0.0047   Epoch: 15   Global Step: 194550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:08:55,891-Speed 3060.14 samples/sec   Loss 3.1852   LearningRate 0.0047   Epoch: 15   Global Step: 194560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:08:59,261-Speed 3039.22 samples/sec   Loss 3.0724   LearningRate 0.0047   Epoch: 15   Global Step: 194570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:09:02,659-Speed 3014.32 samples/sec   Loss 3.1321   LearningRate 0.0047   Epoch: 15   Global Step: 194580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:09:05,993-Speed 3072.48 samples/sec   Loss 3.0714   LearningRate 0.0047   Epoch: 15   Global Step: 194590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:09:09,476-Speed 2940.62 samples/sec   Loss 3.0841   LearningRate 0.0047   Epoch: 15   Global Step: 194600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:09:12,930-Speed 2965.39 samples/sec   Loss 3.0701   LearningRate 0.0047   Epoch: 15   Global Step: 194610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:09:16,305-Speed 3035.90 samples/sec   Loss 3.1482   LearningRate 0.0047   Epoch: 15   Global Step: 194620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:09:19,619-Speed 3090.44 samples/sec   Loss 3.1592   LearningRate 0.0047   Epoch: 15   Global Step: 194630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:09:22,976-Speed 3050.82 samples/sec   Loss 2.9595   LearningRate 0.0047   Epoch: 15   Global Step: 194640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:09:26,340-Speed 3044.81 samples/sec   Loss 3.0384   LearningRate 0.0047   Epoch: 15   Global Step: 194650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:09:29,750-Speed 3004.50 samples/sec   Loss 3.1257   LearningRate 0.0047   Epoch: 15   Global Step: 194660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:09:33,222-Speed 2949.82 samples/sec   Loss 3.0288   LearningRate 0.0047   Epoch: 15   Global Step: 194670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:09:36,656-Speed 2984.16 samples/sec   Loss 3.0698   LearningRate 0.0047   Epoch: 15   Global Step: 194680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:09:40,076-Speed 2994.69 samples/sec   Loss 3.1973   LearningRate 0.0047   Epoch: 15   Global Step: 194690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:09:43,448-Speed 3037.61 samples/sec   Loss 3.1261   LearningRate 0.0047   Epoch: 15   Global Step: 194700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:09:46,837-Speed 3022.78 samples/sec   Loss 3.0866   LearningRate 0.0047   Epoch: 15   Global Step: 194710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:09:50,203-Speed 3042.31 samples/sec   Loss 2.9421   LearningRate 0.0047   Epoch: 15   Global Step: 194720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:09:53,599-Speed 3016.91 samples/sec   Loss 3.0687   LearningRate 0.0047   Epoch: 15   Global Step: 194730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:09:56,972-Speed 3036.98 samples/sec   Loss 3.0307   LearningRate 0.0047   Epoch: 15   Global Step: 194740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:10:00,412-Speed 2976.76 samples/sec   Loss 3.0500   LearningRate 0.0047   Epoch: 15   Global Step: 194750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:10:03,923-Speed 2917.41 samples/sec   Loss 3.0207   LearningRate 0.0047   Epoch: 15   Global Step: 194760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:10:07,343-Speed 2995.53 samples/sec   Loss 3.0377   LearningRate 0.0047   Epoch: 15   Global Step: 194770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:10:10,662-Speed 3085.56 samples/sec   Loss 3.1695   LearningRate 0.0047   Epoch: 15   Global Step: 194780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:10:14,048-Speed 3025.59 samples/sec   Loss 3.1177   LearningRate 0.0047   Epoch: 15   Global Step: 194790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:10:17,487-Speed 2978.00 samples/sec   Loss 3.1270   LearningRate 0.0047   Epoch: 15   Global Step: 194800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:10:20,837-Speed 3057.86 samples/sec   Loss 2.9949   LearningRate 0.0047   Epoch: 15   Global Step: 194810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:10:24,212-Speed 3034.98 samples/sec   Loss 3.1294   LearningRate 0.0047   Epoch: 15   Global Step: 194820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:10:27,663-Speed 2968.41 samples/sec   Loss 3.1059   LearningRate 0.0047   Epoch: 15   Global Step: 194830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:10:31,044-Speed 3029.67 samples/sec   Loss 3.0520   LearningRate 0.0047   Epoch: 15   Global Step: 194840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:10:34,469-Speed 2990.39 samples/sec   Loss 3.0400   LearningRate 0.0047   Epoch: 15   Global Step: 194850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:10:37,872-Speed 3009.94 samples/sec   Loss 3.0845   LearningRate 0.0046   Epoch: 15   Global Step: 194860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:10:41,233-Speed 3047.85 samples/sec   Loss 3.1461   LearningRate 0.0046   Epoch: 15   Global Step: 194870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:10:44,623-Speed 3021.60 samples/sec   Loss 3.0291   LearningRate 0.0046   Epoch: 15   Global Step: 194880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:10:48,004-Speed 3029.24 samples/sec   Loss 3.0828   LearningRate 0.0046   Epoch: 15   Global Step: 194890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:10:51,451-Speed 2971.10 samples/sec   Loss 3.0776   LearningRate 0.0046   Epoch: 15   Global Step: 194900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:10:54,839-Speed 3023.81 samples/sec   Loss 3.1229   LearningRate 0.0046   Epoch: 15   Global Step: 194910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:10:58,189-Speed 3057.41 samples/sec   Loss 3.0331   LearningRate 0.0046   Epoch: 15   Global Step: 194920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:11:01,570-Speed 3029.67 samples/sec   Loss 3.1019   LearningRate 0.0046   Epoch: 15   Global Step: 194930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:11:04,957-Speed 3024.22 samples/sec   Loss 3.0483   LearningRate 0.0046   Epoch: 15   Global Step: 194940   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:11:08,414-Speed 2962.57 samples/sec   Loss 2.9976   LearningRate 0.0046   Epoch: 15   Global Step: 194950   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:11:11,885-Speed 2951.08 samples/sec   Loss 3.0809   LearningRate 0.0046   Epoch: 15   Global Step: 194960   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:11:15,272-Speed 3023.93 samples/sec   Loss 3.0999   LearningRate 0.0046   Epoch: 15   Global Step: 194970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:11:18,634-Speed 3047.41 samples/sec   Loss 3.0496   LearningRate 0.0046   Epoch: 15   Global Step: 194980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:11:21,982-Speed 3059.09 samples/sec   Loss 3.1158   LearningRate 0.0046   Epoch: 15   Global Step: 194990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:11:25,415-Speed 2984.02 samples/sec   Loss 3.1193   LearningRate 0.0046   Epoch: 15   Global Step: 195000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:11:28,751-Speed 3069.84 samples/sec   Loss 3.0681   LearningRate 0.0046   Epoch: 15   Global Step: 195010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:11:32,135-Speed 3026.92 samples/sec   Loss 3.0662   LearningRate 0.0046   Epoch: 15   Global Step: 195020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:11:35,456-Speed 3084.20 samples/sec   Loss 3.0576   LearningRate 0.0046   Epoch: 15   Global Step: 195030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:11:38,835-Speed 3031.49 samples/sec   Loss 3.0796   LearningRate 0.0046   Epoch: 15   Global Step: 195040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:11:42,254-Speed 2996.05 samples/sec   Loss 3.0503   LearningRate 0.0046   Epoch: 15   Global Step: 195050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:11:45,618-Speed 3044.35 samples/sec   Loss 3.1401   LearningRate 0.0046   Epoch: 15   Global Step: 195060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:11:49,049-Speed 2985.94 samples/sec   Loss 3.0011   LearningRate 0.0046   Epoch: 15   Global Step: 195070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:11:52,444-Speed 3016.65 samples/sec   Loss 3.1510   LearningRate 0.0046   Epoch: 15   Global Step: 195080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:11:55,843-Speed 3013.69 samples/sec   Loss 3.0336   LearningRate 0.0046   Epoch: 15   Global Step: 195090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:11:59,253-Speed 3003.62 samples/sec   Loss 3.0906   LearningRate 0.0046   Epoch: 15   Global Step: 195100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:12:02,607-Speed 3053.51 samples/sec   Loss 3.1322   LearningRate 0.0046   Epoch: 15   Global Step: 195110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:12:06,004-Speed 3015.14 samples/sec   Loss 3.0830   LearningRate 0.0046   Epoch: 15   Global Step: 195120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:12:09,366-Speed 3046.94 samples/sec   Loss 3.0464   LearningRate 0.0046   Epoch: 15   Global Step: 195130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:12:12,753-Speed 3024.36 samples/sec   Loss 2.9971   LearningRate 0.0046   Epoch: 15   Global Step: 195140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:12:16,098-Speed 3062.69 samples/sec   Loss 3.1239   LearningRate 0.0046   Epoch: 15   Global Step: 195150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:12:19,458-Speed 3048.43 samples/sec   Loss 3.0852   LearningRate 0.0046   Epoch: 15   Global Step: 195160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:12:22,833-Speed 3035.20 samples/sec   Loss 3.0882   LearningRate 0.0046   Epoch: 15   Global Step: 195170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:12:26,290-Speed 2963.04 samples/sec   Loss 3.1033   LearningRate 0.0046   Epoch: 15   Global Step: 195180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:12:29,665-Speed 3035.01 samples/sec   Loss 3.1168   LearningRate 0.0046   Epoch: 15   Global Step: 195190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:12:33,141-Speed 2946.95 samples/sec   Loss 3.0205   LearningRate 0.0046   Epoch: 15   Global Step: 195200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:12:36,552-Speed 3002.40 samples/sec   Loss 3.0362   LearningRate 0.0046   Epoch: 15   Global Step: 195210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:12:39,914-Speed 3047.14 samples/sec   Loss 3.0701   LearningRate 0.0046   Epoch: 15   Global Step: 195220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:12:43,366-Speed 2967.04 samples/sec   Loss 3.0817   LearningRate 0.0046   Epoch: 15   Global Step: 195230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:12:46,743-Speed 3032.49 samples/sec   Loss 3.1358   LearningRate 0.0046   Epoch: 15   Global Step: 195240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:12:50,111-Speed 3041.60 samples/sec   Loss 3.1056   LearningRate 0.0046   Epoch: 15   Global Step: 195250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:12:53,439-Speed 3077.94 samples/sec   Loss 3.0277   LearningRate 0.0046   Epoch: 15   Global Step: 195260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:12:56,807-Speed 3040.92 samples/sec   Loss 3.0756   LearningRate 0.0046   Epoch: 15   Global Step: 195270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:13:00,195-Speed 3023.26 samples/sec   Loss 2.9842   LearningRate 0.0046   Epoch: 15   Global Step: 195280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:13:03,609-Speed 3000.87 samples/sec   Loss 3.0631   LearningRate 0.0046   Epoch: 15   Global Step: 195290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:13:06,976-Speed 3041.95 samples/sec   Loss 3.0258   LearningRate 0.0046   Epoch: 15   Global Step: 195300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:13:10,359-Speed 3027.27 samples/sec   Loss 3.0868   LearningRate 0.0046   Epoch: 15   Global Step: 195310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:13:13,769-Speed 3004.22 samples/sec   Loss 3.1228   LearningRate 0.0046   Epoch: 15   Global Step: 195320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:13:17,194-Speed 2990.80 samples/sec   Loss 3.0302   LearningRate 0.0046   Epoch: 15   Global Step: 195330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:13:20,582-Speed 3023.26 samples/sec   Loss 3.1448   LearningRate 0.0046   Epoch: 15   Global Step: 195340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:13:23,901-Speed 3085.39 samples/sec   Loss 3.0650   LearningRate 0.0046   Epoch: 15   Global Step: 195350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:13:27,310-Speed 3005.20 samples/sec   Loss 3.1627   LearningRate 0.0046   Epoch: 15   Global Step: 195360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:13:30,694-Speed 3026.42 samples/sec   Loss 3.1468   LearningRate 0.0046   Epoch: 15   Global Step: 195370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:13:34,093-Speed 3013.48 samples/sec   Loss 3.0201   LearningRate 0.0046   Epoch: 15   Global Step: 195380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:13:37,390-Speed 3107.40 samples/sec   Loss 3.1797   LearningRate 0.0046   Epoch: 15   Global Step: 195390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:13:40,896-Speed 2921.28 samples/sec   Loss 3.0912   LearningRate 0.0046   Epoch: 15   Global Step: 195400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:13:44,297-Speed 3011.38 samples/sec   Loss 3.1095   LearningRate 0.0046   Epoch: 15   Global Step: 195410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:13:47,675-Speed 3032.41 samples/sec   Loss 3.1413   LearningRate 0.0046   Epoch: 15   Global Step: 195420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:13:51,031-Speed 3052.00 samples/sec   Loss 3.1256   LearningRate 0.0046   Epoch: 15   Global Step: 195430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:13:54,393-Speed 3046.86 samples/sec   Loss 3.1569   LearningRate 0.0045   Epoch: 15   Global Step: 195440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:13:57,735-Speed 3064.77 samples/sec   Loss 3.0198   LearningRate 0.0045   Epoch: 15   Global Step: 195450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:14:01,097-Speed 3046.50 samples/sec   Loss 3.1040   LearningRate 0.0045   Epoch: 15   Global Step: 195460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:14:04,555-Speed 2962.54 samples/sec   Loss 2.9599   LearningRate 0.0045   Epoch: 15   Global Step: 195470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:14:08,030-Speed 2947.73 samples/sec   Loss 3.0630   LearningRate 0.0045   Epoch: 15   Global Step: 195480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:14:11,500-Speed 2951.87 samples/sec   Loss 3.1019   LearningRate 0.0045   Epoch: 15   Global Step: 195490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:14:14,906-Speed 3006.87 samples/sec   Loss 3.0904   LearningRate 0.0045   Epoch: 15   Global Step: 195500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:14:18,342-Speed 2981.22 samples/sec   Loss 3.0530   LearningRate 0.0045   Epoch: 15   Global Step: 195510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:14:21,727-Speed 3026.22 samples/sec   Loss 3.0447   LearningRate 0.0045   Epoch: 15   Global Step: 195520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:14:25,093-Speed 3042.69 samples/sec   Loss 3.0638   LearningRate 0.0045   Epoch: 15   Global Step: 195530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:14:28,502-Speed 3004.61 samples/sec   Loss 3.0708   LearningRate 0.0045   Epoch: 15   Global Step: 195540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:14:31,959-Speed 2963.27 samples/sec   Loss 2.9918   LearningRate 0.0045   Epoch: 15   Global Step: 195550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:14:35,395-Speed 2980.83 samples/sec   Loss 3.1296   LearningRate 0.0045   Epoch: 15   Global Step: 195560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:14:38,846-Speed 2967.69 samples/sec   Loss 3.0568   LearningRate 0.0045   Epoch: 15   Global Step: 195570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:14:42,279-Speed 2984.22 samples/sec   Loss 3.0334   LearningRate 0.0045   Epoch: 15   Global Step: 195580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:14:45,632-Speed 3054.58 samples/sec   Loss 3.0853   LearningRate 0.0045   Epoch: 15   Global Step: 195590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:14:49,020-Speed 3022.78 samples/sec   Loss 3.0952   LearningRate 0.0045   Epoch: 15   Global Step: 195600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:14:52,424-Speed 3009.61 samples/sec   Loss 3.0430   LearningRate 0.0045   Epoch: 15   Global Step: 195610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:14:55,836-Speed 3001.88 samples/sec   Loss 3.0278   LearningRate 0.0045   Epoch: 15   Global Step: 195620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:14:59,178-Speed 3065.16 samples/sec   Loss 3.0702   LearningRate 0.0045   Epoch: 15   Global Step: 195630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:15:02,566-Speed 3022.87 samples/sec   Loss 3.0848   LearningRate 0.0045   Epoch: 15   Global Step: 195640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:05,910-Speed 3063.68 samples/sec   Loss 3.0801   LearningRate 0.0045   Epoch: 15   Global Step: 195650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:09,268-Speed 3049.91 samples/sec   Loss 3.0575   LearningRate 0.0045   Epoch: 15   Global Step: 195660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:12,651-Speed 3028.08 samples/sec   Loss 2.9953   LearningRate 0.0045   Epoch: 15   Global Step: 195670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:16,075-Speed 2991.26 samples/sec   Loss 3.0220   LearningRate 0.0045   Epoch: 15   Global Step: 195680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:19,472-Speed 3014.85 samples/sec   Loss 3.0009   LearningRate 0.0045   Epoch: 15   Global Step: 195690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:22,960-Speed 2937.17 samples/sec   Loss 2.9626   LearningRate 0.0045   Epoch: 15   Global Step: 195700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:26,357-Speed 3015.52 samples/sec   Loss 3.0484   LearningRate 0.0045   Epoch: 15   Global Step: 195710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:29,791-Speed 2982.91 samples/sec   Loss 3.0833   LearningRate 0.0045   Epoch: 15   Global Step: 195720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:33,269-Speed 2945.60 samples/sec   Loss 3.0268   LearningRate 0.0045   Epoch: 15   Global Step: 195730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:36,578-Speed 3095.22 samples/sec   Loss 3.0914   LearningRate 0.0045   Epoch: 15   Global Step: 195740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:15:39,935-Speed 3051.12 samples/sec   Loss 3.0045   LearningRate 0.0045   Epoch: 15   Global Step: 195750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:43,443-Speed 2920.27 samples/sec   Loss 3.1150   LearningRate 0.0045   Epoch: 15   Global Step: 195760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:46,849-Speed 3007.22 samples/sec   Loss 3.0727   LearningRate 0.0045   Epoch: 15   Global Step: 195770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:50,336-Speed 2937.61 samples/sec   Loss 2.9847   LearningRate 0.0045   Epoch: 15   Global Step: 195780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:53,687-Speed 3056.38 samples/sec   Loss 3.1317   LearningRate 0.0045   Epoch: 15   Global Step: 195790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:15:57,097-Speed 3004.35 samples/sec   Loss 3.0386   LearningRate 0.0045   Epoch: 15   Global Step: 195800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:16:00,421-Speed 3081.66 samples/sec   Loss 3.1134   LearningRate 0.0045   Epoch: 15   Global Step: 195810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:16:03,841-Speed 2994.75 samples/sec   Loss 3.0826   LearningRate 0.0045   Epoch: 15   Global Step: 195820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:16:07,156-Speed 3089.98 samples/sec   Loss 3.0895   LearningRate 0.0045   Epoch: 15   Global Step: 195830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:16:10,501-Speed 3062.53 samples/sec   Loss 3.0592   LearningRate 0.0045   Epoch: 15   Global Step: 195840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:16:13,867-Speed 3043.28 samples/sec   Loss 3.0698   LearningRate 0.0045   Epoch: 15   Global Step: 195850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:16:17,193-Speed 3079.12 samples/sec   Loss 3.0438   LearningRate 0.0045   Epoch: 15   Global Step: 195860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:16:20,568-Speed 3035.47 samples/sec   Loss 3.0981   LearningRate 0.0045   Epoch: 15   Global Step: 195870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:16:23,950-Speed 3028.43 samples/sec   Loss 3.0601   LearningRate 0.0045   Epoch: 15   Global Step: 195880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:16:27,324-Speed 3035.52 samples/sec   Loss 3.0881   LearningRate 0.0045   Epoch: 15   Global Step: 195890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:16:30,752-Speed 2988.54 samples/sec   Loss 3.0514   LearningRate 0.0045   Epoch: 15   Global Step: 195900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:16:34,161-Speed 3004.36 samples/sec   Loss 3.0657   LearningRate 0.0045   Epoch: 15   Global Step: 195910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:16:37,496-Speed 3071.89 samples/sec   Loss 3.0985   LearningRate 0.0045   Epoch: 15   Global Step: 195920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:16:40,836-Speed 3066.89 samples/sec   Loss 3.0366   LearningRate 0.0045   Epoch: 15   Global Step: 195930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:16:44,231-Speed 3016.66 samples/sec   Loss 3.0444   LearningRate 0.0045   Epoch: 15   Global Step: 195940   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:16:47,581-Speed 3057.60 samples/sec   Loss 3.0397   LearningRate 0.0045   Epoch: 15   Global Step: 195950   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:16:50,969-Speed 3023.73 samples/sec   Loss 3.0605   LearningRate 0.0045   Epoch: 15   Global Step: 195960   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:16:54,368-Speed 3013.29 samples/sec   Loss 3.1018   LearningRate 0.0045   Epoch: 15   Global Step: 195970   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:16:57,721-Speed 3055.55 samples/sec   Loss 3.0579   LearningRate 0.0045   Epoch: 15   Global Step: 195980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:17:01,083-Speed 3046.22 samples/sec   Loss 3.0894   LearningRate 0.0045   Epoch: 15   Global Step: 195990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:17:04,430-Speed 3059.84 samples/sec   Loss 3.0614   LearningRate 0.0045   Epoch: 15   Global Step: 196000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:17:07,767-Speed 3070.05 samples/sec   Loss 3.0589   LearningRate 0.0045   Epoch: 15   Global Step: 196010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:17:11,205-Speed 2979.34 samples/sec   Loss 3.0657   LearningRate 0.0044   Epoch: 15   Global Step: 196020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:17:14,590-Speed 3025.73 samples/sec   Loss 3.0373   LearningRate 0.0044   Epoch: 15   Global Step: 196030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:17:17,992-Speed 3013.64 samples/sec   Loss 3.0900   LearningRate 0.0044   Epoch: 15   Global Step: 196040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:17:21,345-Speed 3055.16 samples/sec   Loss 3.0572   LearningRate 0.0044   Epoch: 15   Global Step: 196050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:17:24,742-Speed 3014.91 samples/sec   Loss 3.1274   LearningRate 0.0044   Epoch: 15   Global Step: 196060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:17:28,104-Speed 3046.14 samples/sec   Loss 3.1602   LearningRate 0.0044   Epoch: 15   Global Step: 196070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:17:31,481-Speed 3033.88 samples/sec   Loss 3.0738   LearningRate 0.0044   Epoch: 15   Global Step: 196080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:17:34,870-Speed 3022.22 samples/sec   Loss 3.0930   LearningRate 0.0044   Epoch: 15   Global Step: 196090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:17:38,269-Speed 3013.95 samples/sec   Loss 3.0902   LearningRate 0.0044   Epoch: 15   Global Step: 196100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:17:41,596-Speed 3078.21 samples/sec   Loss 3.0919   LearningRate 0.0044   Epoch: 15   Global Step: 196110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:17:44,949-Speed 3054.76 samples/sec   Loss 3.1485   LearningRate 0.0044   Epoch: 15   Global Step: 196120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:17:48,330-Speed 3029.95 samples/sec   Loss 2.9874   LearningRate 0.0044   Epoch: 15   Global Step: 196130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:17:51,714-Speed 3026.40 samples/sec   Loss 3.0605   LearningRate 0.0044   Epoch: 15   Global Step: 196140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:17:55,116-Speed 3010.89 samples/sec   Loss 3.1014   LearningRate 0.0044   Epoch: 15   Global Step: 196150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:17:58,583-Speed 2954.54 samples/sec   Loss 2.9878   LearningRate 0.0044   Epoch: 15   Global Step: 196160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:02,020-Speed 2980.75 samples/sec   Loss 3.0680   LearningRate 0.0044   Epoch: 15   Global Step: 196170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:05,339-Speed 3085.61 samples/sec   Loss 3.0864   LearningRate 0.0044   Epoch: 15   Global Step: 196180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:08,760-Speed 2994.81 samples/sec   Loss 3.0836   LearningRate 0.0044   Epoch: 15   Global Step: 196190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:12,159-Speed 3013.17 samples/sec   Loss 3.1404   LearningRate 0.0044   Epoch: 15   Global Step: 196200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:15,552-Speed 3018.54 samples/sec   Loss 3.1040   LearningRate 0.0044   Epoch: 15   Global Step: 196210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:18,984-Speed 2985.42 samples/sec   Loss 3.0940   LearningRate 0.0044   Epoch: 15   Global Step: 196220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:22,400-Speed 2998.57 samples/sec   Loss 3.0729   LearningRate 0.0044   Epoch: 15   Global Step: 196230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:25,734-Speed 3071.74 samples/sec   Loss 3.0466   LearningRate 0.0044   Epoch: 15   Global Step: 196240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:29,123-Speed 3021.98 samples/sec   Loss 3.0515   LearningRate 0.0044   Epoch: 15   Global Step: 196250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:32,504-Speed 3029.83 samples/sec   Loss 3.0724   LearningRate 0.0044   Epoch: 15   Global Step: 196260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:35,978-Speed 2948.85 samples/sec   Loss 3.1294   LearningRate 0.0044   Epoch: 15   Global Step: 196270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:39,326-Speed 3059.28 samples/sec   Loss 3.0181   LearningRate 0.0044   Epoch: 15   Global Step: 196280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:42,699-Speed 3036.91 samples/sec   Loss 2.9989   LearningRate 0.0044   Epoch: 15   Global Step: 196290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:46,107-Speed 3005.29 samples/sec   Loss 3.1274   LearningRate 0.0044   Epoch: 15   Global Step: 196300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:49,511-Speed 3008.62 samples/sec   Loss 2.9668   LearningRate 0.0044   Epoch: 15   Global Step: 196310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:52,962-Speed 2968.35 samples/sec   Loss 3.0515   LearningRate 0.0044   Epoch: 15   Global Step: 196320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:56,358-Speed 3016.30 samples/sec   Loss 3.0738   LearningRate 0.0044   Epoch: 15   Global Step: 196330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:18:59,672-Speed 3091.08 samples/sec   Loss 3.1473   LearningRate 0.0044   Epoch: 15   Global Step: 196340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:19:03,113-Speed 2976.86 samples/sec   Loss 3.1166   LearningRate 0.0044   Epoch: 15   Global Step: 196350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:19:06,552-Speed 2978.47 samples/sec   Loss 3.0985   LearningRate 0.0044   Epoch: 15   Global Step: 196360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:19:10,008-Speed 2963.97 samples/sec   Loss 3.0551   LearningRate 0.0044   Epoch: 15   Global Step: 196370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:19:13,479-Speed 2951.43 samples/sec   Loss 3.0690   LearningRate 0.0044   Epoch: 15   Global Step: 196380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:19:16,893-Speed 3000.17 samples/sec   Loss 3.0702   LearningRate 0.0044   Epoch: 15   Global Step: 196390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:19:20,245-Speed 3055.69 samples/sec   Loss 3.0463   LearningRate 0.0044   Epoch: 15   Global Step: 196400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:19:23,654-Speed 3004.20 samples/sec   Loss 3.0997   LearningRate 0.0044   Epoch: 15   Global Step: 196410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:19:27,086-Speed 2984.38 samples/sec   Loss 3.1095   LearningRate 0.0044   Epoch: 15   Global Step: 196420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:19:30,462-Speed 3033.93 samples/sec   Loss 3.0799   LearningRate 0.0044   Epoch: 15   Global Step: 196430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:19:33,824-Speed 3046.37 samples/sec   Loss 3.0540   LearningRate 0.0044   Epoch: 15   Global Step: 196440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:19:37,163-Speed 3068.21 samples/sec   Loss 3.0268   LearningRate 0.0044   Epoch: 15   Global Step: 196450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:19:40,576-Speed 3000.96 samples/sec   Loss 3.0688   LearningRate 0.0044   Epoch: 15   Global Step: 196460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:19:43,953-Speed 3033.19 samples/sec   Loss 2.9914   LearningRate 0.0044   Epoch: 15   Global Step: 196470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:19:47,307-Speed 3053.83 samples/sec   Loss 3.0079   LearningRate 0.0044   Epoch: 15   Global Step: 196480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:19:50,685-Speed 3032.12 samples/sec   Loss 3.0488   LearningRate 0.0044   Epoch: 15   Global Step: 196490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:19:54,070-Speed 3026.38 samples/sec   Loss 3.0568   LearningRate 0.0044   Epoch: 15   Global Step: 196500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:19:57,402-Speed 3073.97 samples/sec   Loss 3.0679   LearningRate 0.0044   Epoch: 15   Global Step: 196510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:20:00,755-Speed 3054.76 samples/sec   Loss 3.0791   LearningRate 0.0044   Epoch: 15   Global Step: 196520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:20:04,149-Speed 3018.09 samples/sec   Loss 3.0244   LearningRate 0.0044   Epoch: 15   Global Step: 196530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:20:07,534-Speed 3025.76 samples/sec   Loss 3.1166   LearningRate 0.0044   Epoch: 15   Global Step: 196540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:20:10,854-Speed 3085.42 samples/sec   Loss 3.0758   LearningRate 0.0044   Epoch: 15   Global Step: 196550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:20:14,212-Speed 3050.51 samples/sec   Loss 3.0896   LearningRate 0.0044   Epoch: 15   Global Step: 196560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:20:17,545-Speed 3073.87 samples/sec   Loss 3.0727   LearningRate 0.0044   Epoch: 15   Global Step: 196570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:20:20,926-Speed 3029.27 samples/sec   Loss 3.0707   LearningRate 0.0044   Epoch: 15   Global Step: 196580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:20:24,349-Speed 2992.18 samples/sec   Loss 3.1149   LearningRate 0.0044   Epoch: 15   Global Step: 196590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:20:27,740-Speed 3021.12 samples/sec   Loss 3.0603   LearningRate 0.0044   Epoch: 15   Global Step: 196600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:20:31,111-Speed 3038.38 samples/sec   Loss 3.0777   LearningRate 0.0043   Epoch: 15   Global Step: 196610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:20:34,494-Speed 3028.44 samples/sec   Loss 3.0848   LearningRate 0.0043   Epoch: 15   Global Step: 196620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:20:37,900-Speed 3007.00 samples/sec   Loss 3.1357   LearningRate 0.0043   Epoch: 15   Global Step: 196630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:20:41,247-Speed 3060.20 samples/sec   Loss 3.1408   LearningRate 0.0043   Epoch: 15   Global Step: 196640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:20:44,660-Speed 3001.02 samples/sec   Loss 3.0569   LearningRate 0.0043   Epoch: 15   Global Step: 196650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:20:48,015-Speed 3053.84 samples/sec   Loss 3.0620   LearningRate 0.0043   Epoch: 15   Global Step: 196660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:20:51,373-Speed 3050.06 samples/sec   Loss 3.1172   LearningRate 0.0043   Epoch: 15   Global Step: 196670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:20:54,737-Speed 3045.45 samples/sec   Loss 3.1086   LearningRate 0.0043   Epoch: 15   Global Step: 196680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:20:58,131-Speed 3017.82 samples/sec   Loss 3.0255   LearningRate 0.0043   Epoch: 15   Global Step: 196690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:01,618-Speed 2936.67 samples/sec   Loss 3.0067   LearningRate 0.0043   Epoch: 15   Global Step: 196700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:05,121-Speed 2924.14 samples/sec   Loss 3.0602   LearningRate 0.0043   Epoch: 15   Global Step: 196710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:08,454-Speed 3073.14 samples/sec   Loss 3.0349   LearningRate 0.0043   Epoch: 15   Global Step: 196720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:11,817-Speed 3045.59 samples/sec   Loss 3.0703   LearningRate 0.0043   Epoch: 15   Global Step: 196730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:21:15,148-Speed 3075.52 samples/sec   Loss 3.0861   LearningRate 0.0043   Epoch: 15   Global Step: 196740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:18,545-Speed 3015.01 samples/sec   Loss 3.0663   LearningRate 0.0043   Epoch: 15   Global Step: 196750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:21,938-Speed 3019.26 samples/sec   Loss 2.9997   LearningRate 0.0043   Epoch: 15   Global Step: 196760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:25,379-Speed 2976.37 samples/sec   Loss 3.0490   LearningRate 0.0043   Epoch: 15   Global Step: 196770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:28,789-Speed 3003.91 samples/sec   Loss 3.0898   LearningRate 0.0043   Epoch: 15   Global Step: 196780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:32,119-Speed 3075.83 samples/sec   Loss 3.1050   LearningRate 0.0043   Epoch: 15   Global Step: 196790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:35,568-Speed 2970.15 samples/sec   Loss 3.0922   LearningRate 0.0043   Epoch: 15   Global Step: 196800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:39,078-Speed 2918.12 samples/sec   Loss 3.0822   LearningRate 0.0043   Epoch: 15   Global Step: 196810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:42,465-Speed 3024.38 samples/sec   Loss 3.0417   LearningRate 0.0043   Epoch: 15   Global Step: 196820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:45,832-Speed 3041.75 samples/sec   Loss 3.0539   LearningRate 0.0043   Epoch: 15   Global Step: 196830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:49,201-Speed 3040.50 samples/sec   Loss 3.1570   LearningRate 0.0043   Epoch: 15   Global Step: 196840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:21:52,542-Speed 3066.60 samples/sec   Loss 3.0729   LearningRate 0.0043   Epoch: 15   Global Step: 196850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:55,958-Speed 2999.54 samples/sec   Loss 3.1109   LearningRate 0.0043   Epoch: 15   Global Step: 196860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:21:59,300-Speed 3064.72 samples/sec   Loss 3.0850   LearningRate 0.0043   Epoch: 15   Global Step: 196870   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 20:22:02,697-Speed 3014.87 samples/sec   Loss 3.0266   LearningRate 0.0043   Epoch: 15   Global Step: 196880   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 20:22:06,125-Speed 2987.90 samples/sec   Loss 2.9587   LearningRate 0.0043   Epoch: 15   Global Step: 196890   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 20:22:09,472-Speed 3060.50 samples/sec   Loss 2.9991   LearningRate 0.0043   Epoch: 15   Global Step: 196900   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 20:22:12,867-Speed 3017.37 samples/sec   Loss 3.0165   LearningRate 0.0043   Epoch: 15   Global Step: 196910   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 20:22:16,209-Speed 3064.82 samples/sec   Loss 3.1122   LearningRate 0.0043   Epoch: 15   Global Step: 196920   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 20:22:19,592-Speed 3027.40 samples/sec   Loss 3.1140   LearningRate 0.0043   Epoch: 15   Global Step: 196930   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 20:22:23,033-Speed 2976.98 samples/sec   Loss 3.0594   LearningRate 0.0043   Epoch: 15   Global Step: 196940   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 20:22:26,396-Speed 3046.17 samples/sec   Loss 2.9963   LearningRate 0.0043   Epoch: 15   Global Step: 196950   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 20:22:29,826-Speed 2985.94 samples/sec   Loss 3.0769   LearningRate 0.0043   Epoch: 15   Global Step: 196960   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 20:22:33,298-Speed 2949.98 samples/sec   Loss 3.0482   LearningRate 0.0043   Epoch: 15   Global Step: 196970   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:22:36,648-Speed 3057.45 samples/sec   Loss 3.0027   LearningRate 0.0043   Epoch: 15   Global Step: 196980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:22:40,032-Speed 3027.03 samples/sec   Loss 3.0567   LearningRate 0.0043   Epoch: 15   Global Step: 196990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:22:43,492-Speed 2960.66 samples/sec   Loss 3.0872   LearningRate 0.0043   Epoch: 15   Global Step: 197000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:22:46,831-Speed 3067.58 samples/sec   Loss 3.0660   LearningRate 0.0043   Epoch: 15   Global Step: 197010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:22:50,228-Speed 3014.96 samples/sec   Loss 3.0881   LearningRate 0.0043   Epoch: 15   Global Step: 197020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:22:53,696-Speed 2953.51 samples/sec   Loss 3.0311   LearningRate 0.0043   Epoch: 15   Global Step: 197030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:22:57,077-Speed 3029.36 samples/sec   Loss 3.1421   LearningRate 0.0043   Epoch: 15   Global Step: 197040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:23:00,435-Speed 3049.97 samples/sec   Loss 3.0965   LearningRate 0.0043   Epoch: 15   Global Step: 197050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:23:03,932-Speed 2929.38 samples/sec   Loss 2.9900   LearningRate 0.0043   Epoch: 15   Global Step: 197060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:23:07,337-Speed 3007.29 samples/sec   Loss 3.0580   LearningRate 0.0043   Epoch: 15   Global Step: 197070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:23:10,740-Speed 3010.43 samples/sec   Loss 3.0579   LearningRate 0.0043   Epoch: 15   Global Step: 197080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:23:14,185-Speed 2973.57 samples/sec   Loss 3.0887   LearningRate 0.0043   Epoch: 15   Global Step: 197090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:23:17,646-Speed 2959.09 samples/sec   Loss 3.1065   LearningRate 0.0043   Epoch: 15   Global Step: 197100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:23:21,063-Speed 2997.28 samples/sec   Loss 3.0855   LearningRate 0.0043   Epoch: 15   Global Step: 197110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:23:24,473-Speed 3003.57 samples/sec   Loss 3.1105   LearningRate 0.0043   Epoch: 15   Global Step: 197120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:23:27,856-Speed 3027.80 samples/sec   Loss 3.0087   LearningRate 0.0043   Epoch: 15   Global Step: 197130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:23:31,291-Speed 2981.95 samples/sec   Loss 3.0475   LearningRate 0.0043   Epoch: 15   Global Step: 197140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:23:34,617-Speed 3079.54 samples/sec   Loss 3.0578   LearningRate 0.0043   Epoch: 15   Global Step: 197150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:23:37,951-Speed 3072.46 samples/sec   Loss 3.0544   LearningRate 0.0043   Epoch: 15   Global Step: 197160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:23:41,379-Speed 2987.86 samples/sec   Loss 3.0797   LearningRate 0.0043   Epoch: 15   Global Step: 197170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:23:44,825-Speed 2971.65 samples/sec   Loss 3.0547   LearningRate 0.0043   Epoch: 15   Global Step: 197180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:23:48,317-Speed 2933.26 samples/sec   Loss 3.0310   LearningRate 0.0043   Epoch: 15   Global Step: 197190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:23:51,731-Speed 3001.02 samples/sec   Loss 3.0673   LearningRate 0.0043   Epoch: 15   Global Step: 197200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:23:55,175-Speed 2974.05 samples/sec   Loss 3.0539   LearningRate 0.0042   Epoch: 15   Global Step: 197210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:23:58,571-Speed 3015.73 samples/sec   Loss 3.0414   LearningRate 0.0042   Epoch: 15   Global Step: 197220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:24:02,021-Speed 2969.22 samples/sec   Loss 2.9969   LearningRate 0.0042   Epoch: 15   Global Step: 197230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:24:05,384-Speed 3044.96 samples/sec   Loss 3.0831   LearningRate 0.0042   Epoch: 15   Global Step: 197240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:24:08,741-Speed 3052.28 samples/sec   Loss 3.0477   LearningRate 0.0042   Epoch: 15   Global Step: 197250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:24:12,158-Speed 2997.09 samples/sec   Loss 3.0369   LearningRate 0.0042   Epoch: 15   Global Step: 197260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:24:15,568-Speed 3004.27 samples/sec   Loss 3.1077   LearningRate 0.0042   Epoch: 15   Global Step: 197270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:24:18,999-Speed 2984.54 samples/sec   Loss 3.0305   LearningRate 0.0042   Epoch: 15   Global Step: 197280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:24:22,498-Speed 2928.36 samples/sec   Loss 3.0809   LearningRate 0.0042   Epoch: 15   Global Step: 197290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:24:25,916-Speed 2995.92 samples/sec   Loss 3.0217   LearningRate 0.0042   Epoch: 15   Global Step: 197300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:24:29,408-Speed 2933.70 samples/sec   Loss 3.0705   LearningRate 0.0042   Epoch: 15   Global Step: 197310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:24:32,777-Speed 3039.97 samples/sec   Loss 3.1147   LearningRate 0.0042   Epoch: 15   Global Step: 197320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:24:36,219-Speed 2976.20 samples/sec   Loss 3.0470   LearningRate 0.0042   Epoch: 15   Global Step: 197330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:24:39,530-Speed 3093.83 samples/sec   Loss 2.9840   LearningRate 0.0042   Epoch: 15   Global Step: 197340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:24:42,904-Speed 3035.07 samples/sec   Loss 2.9781   LearningRate 0.0042   Epoch: 15   Global Step: 197350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:24:46,349-Speed 2973.13 samples/sec   Loss 3.0775   LearningRate 0.0042   Epoch: 15   Global Step: 197360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:24:49,801-Speed 2966.91 samples/sec   Loss 3.0545   LearningRate 0.0042   Epoch: 15   Global Step: 197370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:24:53,221-Speed 2996.04 samples/sec   Loss 3.0917   LearningRate 0.0042   Epoch: 15   Global Step: 197380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:24:56,571-Speed 3057.05 samples/sec   Loss 3.1400   LearningRate 0.0042   Epoch: 15   Global Step: 197390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:24:59,984-Speed 3001.24 samples/sec   Loss 3.1064   LearningRate 0.0042   Epoch: 15   Global Step: 197400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:25:03,441-Speed 2963.31 samples/sec   Loss 3.0625   LearningRate 0.0042   Epoch: 15   Global Step: 197410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:25:06,917-Speed 2945.85 samples/sec   Loss 3.0179   LearningRate 0.0042   Epoch: 15   Global Step: 197420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:25:10,364-Speed 2972.60 samples/sec   Loss 3.0682   LearningRate 0.0042   Epoch: 15   Global Step: 197430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:25:13,747-Speed 3027.38 samples/sec   Loss 3.0479   LearningRate 0.0042   Epoch: 15   Global Step: 197440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:25:17,051-Speed 3099.93 samples/sec   Loss 3.0702   LearningRate 0.0042   Epoch: 15   Global Step: 197450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:25:20,382-Speed 3075.75 samples/sec   Loss 3.0701   LearningRate 0.0042   Epoch: 15   Global Step: 197460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:25:23,784-Speed 3010.30 samples/sec   Loss 3.1121   LearningRate 0.0042   Epoch: 15   Global Step: 197470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:25:27,254-Speed 2951.53 samples/sec   Loss 3.1157   LearningRate 0.0042   Epoch: 15   Global Step: 197480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:25:30,632-Speed 3032.37 samples/sec   Loss 3.1127   LearningRate 0.0042   Epoch: 15   Global Step: 197490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:25:34,064-Speed 2985.32 samples/sec   Loss 3.0419   LearningRate 0.0042   Epoch: 15   Global Step: 197500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:25:37,408-Speed 3062.76 samples/sec   Loss 3.1187   LearningRate 0.0042   Epoch: 15   Global Step: 197510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:25:40,757-Speed 3058.68 samples/sec   Loss 3.0036   LearningRate 0.0042   Epoch: 15   Global Step: 197520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:25:44,127-Speed 3038.59 samples/sec   Loss 2.9618   LearningRate 0.0042   Epoch: 15   Global Step: 197530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:25:47,489-Speed 3047.16 samples/sec   Loss 3.0170   LearningRate 0.0042   Epoch: 15   Global Step: 197540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:25:50,923-Speed 2982.77 samples/sec   Loss 3.0672   LearningRate 0.0042   Epoch: 15   Global Step: 197550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:25:54,283-Speed 3048.46 samples/sec   Loss 2.9563   LearningRate 0.0042   Epoch: 15   Global Step: 197560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:25:57,653-Speed 3038.95 samples/sec   Loss 3.0456   LearningRate 0.0042   Epoch: 15   Global Step: 197570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:26:01,023-Speed 3039.30 samples/sec   Loss 3.0974   LearningRate 0.0042   Epoch: 15   Global Step: 197580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:26:04,410-Speed 3024.48 samples/sec   Loss 3.0619   LearningRate 0.0042   Epoch: 15   Global Step: 197590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:26:07,802-Speed 3019.94 samples/sec   Loss 3.0962   LearningRate 0.0042   Epoch: 15   Global Step: 197600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:26:11,170-Speed 3040.54 samples/sec   Loss 2.9898   LearningRate 0.0042   Epoch: 15   Global Step: 197610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:26:14,556-Speed 3025.51 samples/sec   Loss 3.0081   LearningRate 0.0042   Epoch: 15   Global Step: 197620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:26:17,870-Speed 3090.87 samples/sec   Loss 3.0133   LearningRate 0.0042   Epoch: 15   Global Step: 197630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:26:21,275-Speed 3007.59 samples/sec   Loss 3.0230   LearningRate 0.0042   Epoch: 15   Global Step: 197640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:26:24,677-Speed 3010.83 samples/sec   Loss 3.1344   LearningRate 0.0042   Epoch: 15   Global Step: 197650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:26:28,019-Speed 3065.42 samples/sec   Loss 3.0929   LearningRate 0.0042   Epoch: 15   Global Step: 197660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:26:31,377-Speed 3050.35 samples/sec   Loss 3.1397   LearningRate 0.0042   Epoch: 15   Global Step: 197670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:26:34,788-Speed 3002.82 samples/sec   Loss 3.0338   LearningRate 0.0042   Epoch: 15   Global Step: 197680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:26:38,184-Speed 3016.22 samples/sec   Loss 3.0014   LearningRate 0.0042   Epoch: 15   Global Step: 197690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:26:41,569-Speed 3026.13 samples/sec   Loss 3.0505   LearningRate 0.0042   Epoch: 15   Global Step: 197700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:26:44,954-Speed 3025.33 samples/sec   Loss 3.0442   LearningRate 0.0042   Epoch: 15   Global Step: 197710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:26:48,384-Speed 2986.57 samples/sec   Loss 3.0602   LearningRate 0.0042   Epoch: 15   Global Step: 197720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:26:51,808-Speed 2991.30 samples/sec   Loss 3.1712   LearningRate 0.0042   Epoch: 15   Global Step: 197730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:26:55,122-Speed 3091.13 samples/sec   Loss 2.9913   LearningRate 0.0042   Epoch: 15   Global Step: 197740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:26:58,562-Speed 2977.58 samples/sec   Loss 3.0972   LearningRate 0.0042   Epoch: 15   Global Step: 197750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:27:01,957-Speed 3016.82 samples/sec   Loss 3.0166   LearningRate 0.0042   Epoch: 15   Global Step: 197760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:27:05,336-Speed 3031.45 samples/sec   Loss 3.0343   LearningRate 0.0042   Epoch: 15   Global Step: 197770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:27:08,746-Speed 3004.18 samples/sec   Loss 2.9737   LearningRate 0.0042   Epoch: 15   Global Step: 197780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:27:12,209-Speed 2957.04 samples/sec   Loss 3.0525   LearningRate 0.0042   Epoch: 15   Global Step: 197790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:27:15,630-Speed 2994.10 samples/sec   Loss 3.0134   LearningRate 0.0042   Epoch: 15   Global Step: 197800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:27:19,100-Speed 2952.38 samples/sec   Loss 2.9714   LearningRate 0.0042   Epoch: 15   Global Step: 197810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:27:22,491-Speed 3020.75 samples/sec   Loss 3.0154   LearningRate 0.0041   Epoch: 15   Global Step: 197820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:27:25,964-Speed 2949.45 samples/sec   Loss 3.1010   LearningRate 0.0041   Epoch: 15   Global Step: 197830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:27:29,377-Speed 3000.63 samples/sec   Loss 3.0076   LearningRate 0.0041   Epoch: 15   Global Step: 197840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:27:32,772-Speed 3017.06 samples/sec   Loss 3.1433   LearningRate 0.0041   Epoch: 15   Global Step: 197850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:27:36,144-Speed 3037.50 samples/sec   Loss 3.0643   LearningRate 0.0041   Epoch: 15   Global Step: 197860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:27:39,560-Speed 2998.88 samples/sec   Loss 2.9854   LearningRate 0.0041   Epoch: 15   Global Step: 197870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:27:42,917-Speed 3051.43 samples/sec   Loss 3.0720   LearningRate 0.0041   Epoch: 15   Global Step: 197880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:27:46,391-Speed 2948.14 samples/sec   Loss 3.0836   LearningRate 0.0041   Epoch: 15   Global Step: 197890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:27:49,878-Speed 2937.93 samples/sec   Loss 3.0146   LearningRate 0.0041   Epoch: 15   Global Step: 197900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:27:53,337-Speed 2960.57 samples/sec   Loss 3.0437   LearningRate 0.0041   Epoch: 15   Global Step: 197910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:27:56,742-Speed 3008.72 samples/sec   Loss 3.1107   LearningRate 0.0041   Epoch: 15   Global Step: 197920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:28:00,158-Speed 2997.72 samples/sec   Loss 3.0340   LearningRate 0.0041   Epoch: 15   Global Step: 197930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:28:03,543-Speed 3025.99 samples/sec   Loss 3.0438   LearningRate 0.0041   Epoch: 15   Global Step: 197940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:28:06,864-Speed 3084.68 samples/sec   Loss 3.0252   LearningRate 0.0041   Epoch: 15   Global Step: 197950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:28:10,195-Speed 3074.94 samples/sec   Loss 3.0556   LearningRate 0.0041   Epoch: 15   Global Step: 197960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:28:13,519-Speed 3081.33 samples/sec   Loss 2.9667   LearningRate 0.0041   Epoch: 15   Global Step: 197970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:28:16,818-Speed 3104.25 samples/sec   Loss 3.0950   LearningRate 0.0041   Epoch: 15   Global Step: 197980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:28:20,262-Speed 2974.56 samples/sec   Loss 3.0592   LearningRate 0.0041   Epoch: 15   Global Step: 197990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:28:23,712-Speed 2969.88 samples/sec   Loss 2.9403   LearningRate 0.0041   Epoch: 15   Global Step: 198000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:28:27,120-Speed 3005.67 samples/sec   Loss 3.0413   LearningRate 0.0041   Epoch: 15   Global Step: 198010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:28:30,612-Speed 2932.93 samples/sec   Loss 3.1143   LearningRate 0.0041   Epoch: 15   Global Step: 198020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:28:34,013-Speed 3011.59 samples/sec   Loss 3.1426   LearningRate 0.0041   Epoch: 15   Global Step: 198030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:28:37,383-Speed 3039.52 samples/sec   Loss 3.0427   LearningRate 0.0041   Epoch: 15   Global Step: 198040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:28:40,785-Speed 3010.76 samples/sec   Loss 3.1729   LearningRate 0.0041   Epoch: 15   Global Step: 198050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:28:44,169-Speed 3027.42 samples/sec   Loss 3.0542   LearningRate 0.0041   Epoch: 15   Global Step: 198060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:28:47,573-Speed 3008.24 samples/sec   Loss 3.0435   LearningRate 0.0041   Epoch: 15   Global Step: 198070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:28:50,944-Speed 3038.57 samples/sec   Loss 2.9685   LearningRate 0.0041   Epoch: 15   Global Step: 198080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:28:54,328-Speed 3026.78 samples/sec   Loss 3.0237   LearningRate 0.0041   Epoch: 15   Global Step: 198090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:28:57,756-Speed 2988.23 samples/sec   Loss 3.0884   LearningRate 0.0041   Epoch: 15   Global Step: 198100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:29:01,160-Speed 3009.34 samples/sec   Loss 3.1207   LearningRate 0.0041   Epoch: 15   Global Step: 198110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:29:04,595-Speed 2982.54 samples/sec   Loss 3.0956   LearningRate 0.0041   Epoch: 15   Global Step: 198120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:29:07,978-Speed 3027.69 samples/sec   Loss 3.0914   LearningRate 0.0041   Epoch: 15   Global Step: 198130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:29:11,386-Speed 3005.93 samples/sec   Loss 3.0145   LearningRate 0.0041   Epoch: 15   Global Step: 198140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:29:14,770-Speed 3026.32 samples/sec   Loss 3.0099   LearningRate 0.0041   Epoch: 15   Global Step: 198150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:29:18,132-Speed 3047.21 samples/sec   Loss 3.0619   LearningRate 0.0041   Epoch: 15   Global Step: 198160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:29:21,496-Speed 3044.39 samples/sec   Loss 3.0337   LearningRate 0.0041   Epoch: 15   Global Step: 198170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:29:24,864-Speed 3041.43 samples/sec   Loss 2.9657   LearningRate 0.0041   Epoch: 15   Global Step: 198180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:29:28,242-Speed 3031.94 samples/sec   Loss 3.0440   LearningRate 0.0041   Epoch: 15   Global Step: 198190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:29:31,664-Speed 2993.34 samples/sec   Loss 3.1012   LearningRate 0.0041   Epoch: 15   Global Step: 198200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:29:35,035-Speed 3038.76 samples/sec   Loss 2.9444   LearningRate 0.0041   Epoch: 15   Global Step: 198210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:29:38,363-Speed 3078.35 samples/sec   Loss 3.0283   LearningRate 0.0041   Epoch: 15   Global Step: 198220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:29:41,723-Speed 3049.09 samples/sec   Loss 2.9612   LearningRate 0.0041   Epoch: 15   Global Step: 198230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:29:45,153-Speed 2986.14 samples/sec   Loss 3.0148   LearningRate 0.0041   Epoch: 15   Global Step: 198240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:29:48,608-Speed 2964.16 samples/sec   Loss 3.0541   LearningRate 0.0041   Epoch: 15   Global Step: 198250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:29:51,968-Speed 3049.14 samples/sec   Loss 3.1204   LearningRate 0.0041   Epoch: 15   Global Step: 198260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:29:55,399-Speed 2985.58 samples/sec   Loss 3.0501   LearningRate 0.0041   Epoch: 15   Global Step: 198270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:29:58,799-Speed 3011.96 samples/sec   Loss 3.0242   LearningRate 0.0041   Epoch: 15   Global Step: 198280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:02,193-Speed 3018.48 samples/sec   Loss 3.1167   LearningRate 0.0041   Epoch: 15   Global Step: 198290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:05,594-Speed 3011.45 samples/sec   Loss 2.9232   LearningRate 0.0041   Epoch: 15   Global Step: 198300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:08,970-Speed 3033.86 samples/sec   Loss 3.0627   LearningRate 0.0041   Epoch: 15   Global Step: 198310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:12,320-Speed 3057.64 samples/sec   Loss 3.0418   LearningRate 0.0041   Epoch: 15   Global Step: 198320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:15,762-Speed 2975.88 samples/sec   Loss 3.0444   LearningRate 0.0041   Epoch: 15   Global Step: 198330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:30:19,084-Speed 3083.13 samples/sec   Loss 3.0800   LearningRate 0.0041   Epoch: 15   Global Step: 198340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:30:22,409-Speed 3080.77 samples/sec   Loss 3.0252   LearningRate 0.0041   Epoch: 15   Global Step: 198350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:30:25,710-Speed 3103.00 samples/sec   Loss 3.0271   LearningRate 0.0041   Epoch: 15   Global Step: 198360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:29,044-Speed 3072.51 samples/sec   Loss 3.0856   LearningRate 0.0041   Epoch: 15   Global Step: 198370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:32,416-Speed 3037.59 samples/sec   Loss 3.0669   LearningRate 0.0041   Epoch: 15   Global Step: 198380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:35,813-Speed 3014.42 samples/sec   Loss 3.0680   LearningRate 0.0041   Epoch: 15   Global Step: 198390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:39,171-Speed 3050.81 samples/sec   Loss 2.9964   LearningRate 0.0041   Epoch: 15   Global Step: 198400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:42,601-Speed 2986.38 samples/sec   Loss 3.0960   LearningRate 0.0041   Epoch: 15   Global Step: 198410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:45,944-Speed 3063.91 samples/sec   Loss 3.0979   LearningRate 0.0041   Epoch: 15   Global Step: 198420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:49,322-Speed 3032.58 samples/sec   Loss 2.9165   LearningRate 0.0040   Epoch: 15   Global Step: 198430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:52,711-Speed 3033.40 samples/sec   Loss 2.9721   LearningRate 0.0040   Epoch: 15   Global Step: 198440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:56,092-Speed 3029.51 samples/sec   Loss 2.9670   LearningRate 0.0040   Epoch: 15   Global Step: 198450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:30:59,469-Speed 3033.77 samples/sec   Loss 2.9626   LearningRate 0.0040   Epoch: 15   Global Step: 198460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:31:02,871-Speed 3010.83 samples/sec   Loss 2.9993   LearningRate 0.0040   Epoch: 15   Global Step: 198470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:31:06,257-Speed 3025.23 samples/sec   Loss 2.9274   LearningRate 0.0040   Epoch: 15   Global Step: 198480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:31:09,770-Speed 2915.57 samples/sec   Loss 2.9496   LearningRate 0.0040   Epoch: 15   Global Step: 198490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:31:13,154-Speed 3026.93 samples/sec   Loss 3.0761   LearningRate 0.0040   Epoch: 15   Global Step: 198500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:31:16,595-Speed 2976.64 samples/sec   Loss 3.0194   LearningRate 0.0040   Epoch: 15   Global Step: 198510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:31:20,048-Speed 2966.38 samples/sec   Loss 3.0972   LearningRate 0.0040   Epoch: 15   Global Step: 198520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:31:23,363-Speed 3089.62 samples/sec   Loss 2.9628   LearningRate 0.0040   Epoch: 15   Global Step: 198530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:31:26,764-Speed 3011.98 samples/sec   Loss 3.0197   LearningRate 0.0040   Epoch: 15   Global Step: 198540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:31:30,130-Speed 3043.10 samples/sec   Loss 2.9549   LearningRate 0.0040   Epoch: 15   Global Step: 198550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:31:33,503-Speed 3036.84 samples/sec   Loss 2.9811   LearningRate 0.0040   Epoch: 15   Global Step: 198560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:31:36,917-Speed 2999.82 samples/sec   Loss 3.1068   LearningRate 0.0040   Epoch: 15   Global Step: 198570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:31:40,325-Speed 3006.16 samples/sec   Loss 3.0658   LearningRate 0.0040   Epoch: 15   Global Step: 198580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:31:43,752-Speed 2988.66 samples/sec   Loss 3.0847   LearningRate 0.0040   Epoch: 15   Global Step: 198590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:31:47,075-Speed 3081.77 samples/sec   Loss 3.0376   LearningRate 0.0040   Epoch: 15   Global Step: 198600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:31:50,420-Speed 3062.48 samples/sec   Loss 3.0110   LearningRate 0.0040   Epoch: 15   Global Step: 198610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:31:53,777-Speed 3051.05 samples/sec   Loss 3.0501   LearningRate 0.0040   Epoch: 15   Global Step: 198620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:31:57,184-Speed 3006.42 samples/sec   Loss 3.0303   LearningRate 0.0040   Epoch: 15   Global Step: 198630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:32:00,564-Speed 3030.71 samples/sec   Loss 3.0632   LearningRate 0.0040   Epoch: 15   Global Step: 198640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:32:04,039-Speed 2947.47 samples/sec   Loss 3.0745   LearningRate 0.0040   Epoch: 15   Global Step: 198650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:32:07,444-Speed 3008.49 samples/sec   Loss 3.0606   LearningRate 0.0040   Epoch: 15   Global Step: 198660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:32:10,833-Speed 3022.36 samples/sec   Loss 3.0202   LearningRate 0.0040   Epoch: 15   Global Step: 198670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:32:14,148-Speed 3090.07 samples/sec   Loss 3.0642   LearningRate 0.0040   Epoch: 15   Global Step: 198680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:32:17,576-Speed 2987.60 samples/sec   Loss 3.0540   LearningRate 0.0040   Epoch: 15   Global Step: 198690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:32:21,000-Speed 2991.56 samples/sec   Loss 3.0443   LearningRate 0.0040   Epoch: 15   Global Step: 198700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:32:24,408-Speed 3006.77 samples/sec   Loss 3.0192   LearningRate 0.0040   Epoch: 15   Global Step: 198710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:32:27,799-Speed 3020.47 samples/sec   Loss 3.0681   LearningRate 0.0040   Epoch: 15   Global Step: 198720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:32:31,471-Speed 2789.63 samples/sec   Loss 2.9739   LearningRate 0.0040   Epoch: 15   Global Step: 198730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:33:03,994-Speed 314.87 samples/sec   Loss 2.5891   LearningRate 0.0040   Epoch: 16   Global Step: 198740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:33:07,356-Speed 3046.50 samples/sec   Loss 1.9987   LearningRate 0.0040   Epoch: 16   Global Step: 198750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:33:10,899-Speed 2891.60 samples/sec   Loss 2.0033   LearningRate 0.0040   Epoch: 16   Global Step: 198760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:33:14,297-Speed 3014.18 samples/sec   Loss 2.0097   LearningRate 0.0040   Epoch: 16   Global Step: 198770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:33:17,696-Speed 3014.14 samples/sec   Loss 1.9355   LearningRate 0.0040   Epoch: 16   Global Step: 198780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:33:21,066-Speed 3039.68 samples/sec   Loss 1.9545   LearningRate 0.0040   Epoch: 16   Global Step: 198790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:33:24,556-Speed 2934.78 samples/sec   Loss 1.9735   LearningRate 0.0040   Epoch: 16   Global Step: 198800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:33:27,977-Speed 2994.61 samples/sec   Loss 1.9689   LearningRate 0.0040   Epoch: 16   Global Step: 198810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:33:31,419-Speed 2976.48 samples/sec   Loss 2.0019   LearningRate 0.0040   Epoch: 16   Global Step: 198820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:33:35,224-Speed 2691.91 samples/sec   Loss 1.9245   LearningRate 0.0040   Epoch: 16   Global Step: 198830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:33:38,643-Speed 2996.51 samples/sec   Loss 1.9598   LearningRate 0.0040   Epoch: 16   Global Step: 198840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:33:42,028-Speed 3025.94 samples/sec   Loss 1.9370   LearningRate 0.0040   Epoch: 16   Global Step: 198850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:33:45,642-Speed 2834.45 samples/sec   Loss 1.9799   LearningRate 0.0040   Epoch: 16   Global Step: 198860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:33:48,997-Speed 3053.17 samples/sec   Loss 1.9380   LearningRate 0.0040   Epoch: 16   Global Step: 198870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:33:52,493-Speed 2929.77 samples/sec   Loss 1.9739   LearningRate 0.0040   Epoch: 16   Global Step: 198880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:33:55,885-Speed 3020.27 samples/sec   Loss 1.9504   LearningRate 0.0040   Epoch: 16   Global Step: 198890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:33:59,328-Speed 2974.81 samples/sec   Loss 1.9501   LearningRate 0.0040   Epoch: 16   Global Step: 198900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:34:02,783-Speed 2964.72 samples/sec   Loss 1.9244   LearningRate 0.0040   Epoch: 16   Global Step: 198910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:34:06,126-Speed 3063.64 samples/sec   Loss 1.9064   LearningRate 0.0040   Epoch: 16   Global Step: 198920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:34:09,457-Speed 3074.71 samples/sec   Loss 1.9806   LearningRate 0.0040   Epoch: 16   Global Step: 198930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:34:12,824-Speed 3043.12 samples/sec   Loss 1.9493   LearningRate 0.0040   Epoch: 16   Global Step: 198940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:34:16,247-Speed 2992.23 samples/sec   Loss 1.9922   LearningRate 0.0040   Epoch: 16   Global Step: 198950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:34:19,735-Speed 2936.09 samples/sec   Loss 1.9799   LearningRate 0.0040   Epoch: 16   Global Step: 198960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:34:23,138-Speed 3010.74 samples/sec   Loss 2.0003   LearningRate 0.0040   Epoch: 16   Global Step: 198970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:34:26,498-Speed 3048.64 samples/sec   Loss 1.9889   LearningRate 0.0040   Epoch: 16   Global Step: 198980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:34:29,830-Speed 3073.53 samples/sec   Loss 1.9501   LearningRate 0.0040   Epoch: 16   Global Step: 198990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:34:33,251-Speed 2994.51 samples/sec   Loss 1.9211   LearningRate 0.0040   Epoch: 16   Global Step: 199000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:34:36,596-Speed 3061.65 samples/sec   Loss 1.9562   LearningRate 0.0040   Epoch: 16   Global Step: 199010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:34:39,979-Speed 3028.32 samples/sec   Loss 1.9902   LearningRate 0.0040   Epoch: 16   Global Step: 199020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:34:43,299-Speed 3085.58 samples/sec   Loss 2.0287   LearningRate 0.0040   Epoch: 16   Global Step: 199030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:34:46,635-Speed 3069.89 samples/sec   Loss 2.0304   LearningRate 0.0040   Epoch: 16   Global Step: 199040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:34:50,035-Speed 3012.36 samples/sec   Loss 2.0102   LearningRate 0.0039   Epoch: 16   Global Step: 199050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:34:53,481-Speed 2972.63 samples/sec   Loss 1.9799   LearningRate 0.0039   Epoch: 16   Global Step: 199060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:34:56,943-Speed 2958.94 samples/sec   Loss 1.9644   LearningRate 0.0039   Epoch: 16   Global Step: 199070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:00,331-Speed 3023.01 samples/sec   Loss 2.0197   LearningRate 0.0039   Epoch: 16   Global Step: 199080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:03,812-Speed 2943.09 samples/sec   Loss 1.9396   LearningRate 0.0039   Epoch: 16   Global Step: 199090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:07,120-Speed 3096.01 samples/sec   Loss 1.9240   LearningRate 0.0039   Epoch: 16   Global Step: 199100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:10,475-Speed 3052.84 samples/sec   Loss 1.9510   LearningRate 0.0039   Epoch: 16   Global Step: 199110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:13,877-Speed 3010.53 samples/sec   Loss 2.0006   LearningRate 0.0039   Epoch: 16   Global Step: 199120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:17,263-Speed 3025.93 samples/sec   Loss 1.9473   LearningRate 0.0039   Epoch: 16   Global Step: 199130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:20,597-Speed 3071.85 samples/sec   Loss 2.0190   LearningRate 0.0039   Epoch: 16   Global Step: 199140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:24,004-Speed 3006.86 samples/sec   Loss 1.9448   LearningRate 0.0039   Epoch: 16   Global Step: 199150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:27,329-Speed 3079.85 samples/sec   Loss 1.9311   LearningRate 0.0039   Epoch: 16   Global Step: 199160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:35:30,736-Speed 3006.92 samples/sec   Loss 2.0387   LearningRate 0.0039   Epoch: 16   Global Step: 199170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:35:34,036-Speed 3103.60 samples/sec   Loss 2.0298   LearningRate 0.0039   Epoch: 16   Global Step: 199180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:37,358-Speed 3083.69 samples/sec   Loss 1.9778   LearningRate 0.0039   Epoch: 16   Global Step: 199190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:40,799-Speed 2976.48 samples/sec   Loss 1.9971   LearningRate 0.0039   Epoch: 16   Global Step: 199200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:44,180-Speed 3029.71 samples/sec   Loss 1.9820   LearningRate 0.0039   Epoch: 16   Global Step: 199210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:47,548-Speed 3041.58 samples/sec   Loss 1.9844   LearningRate 0.0039   Epoch: 16   Global Step: 199220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:50,934-Speed 3024.76 samples/sec   Loss 1.9584   LearningRate 0.0039   Epoch: 16   Global Step: 199230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:54,354-Speed 2995.46 samples/sec   Loss 2.0317   LearningRate 0.0039   Epoch: 16   Global Step: 199240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:35:57,833-Speed 2944.00 samples/sec   Loss 1.9811   LearningRate 0.0039   Epoch: 16   Global Step: 199250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:36:01,235-Speed 3011.61 samples/sec   Loss 1.9362   LearningRate 0.0039   Epoch: 16   Global Step: 199260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:36:04,648-Speed 3001.14 samples/sec   Loss 1.9404   LearningRate 0.0039   Epoch: 16   Global Step: 199270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:36:08,049-Speed 3012.00 samples/sec   Loss 1.9934   LearningRate 0.0039   Epoch: 16   Global Step: 199280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:36:11,499-Speed 2968.94 samples/sec   Loss 1.9761   LearningRate 0.0039   Epoch: 16   Global Step: 199290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:36:14,883-Speed 3027.24 samples/sec   Loss 1.9961   LearningRate 0.0039   Epoch: 16   Global Step: 199300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:36:18,318-Speed 2981.59 samples/sec   Loss 1.9713   LearningRate 0.0039   Epoch: 16   Global Step: 199310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:36:21,681-Speed 3045.94 samples/sec   Loss 2.0378   LearningRate 0.0039   Epoch: 16   Global Step: 199320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:36:25,090-Speed 3004.81 samples/sec   Loss 2.0853   LearningRate 0.0039   Epoch: 16   Global Step: 199330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:36:28,485-Speed 3017.08 samples/sec   Loss 2.0343   LearningRate 0.0039   Epoch: 16   Global Step: 199340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:36:31,901-Speed 2998.24 samples/sec   Loss 2.0256   LearningRate 0.0039   Epoch: 16   Global Step: 199350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:36:35,252-Speed 3056.89 samples/sec   Loss 2.0131   LearningRate 0.0039   Epoch: 16   Global Step: 199360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:36:38,658-Speed 3007.68 samples/sec   Loss 2.0052   LearningRate 0.0039   Epoch: 16   Global Step: 199370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:36:42,117-Speed 2961.62 samples/sec   Loss 1.9758   LearningRate 0.0039   Epoch: 16   Global Step: 199380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:36:45,504-Speed 3024.29 samples/sec   Loss 2.0409   LearningRate 0.0039   Epoch: 16   Global Step: 199390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:36:48,911-Speed 3005.77 samples/sec   Loss 2.0182   LearningRate 0.0039   Epoch: 16   Global Step: 199400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:36:52,362-Speed 2968.04 samples/sec   Loss 2.0421   LearningRate 0.0039   Epoch: 16   Global Step: 199410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:36:55,836-Speed 2948.41 samples/sec   Loss 1.9997   LearningRate 0.0039   Epoch: 16   Global Step: 199420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:36:59,293-Speed 2963.59 samples/sec   Loss 2.0519   LearningRate 0.0039   Epoch: 16   Global Step: 199430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:37:02,725-Speed 2984.32 samples/sec   Loss 2.0213   LearningRate 0.0039   Epoch: 16   Global Step: 199440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:37:06,141-Speed 2998.63 samples/sec   Loss 2.0144   LearningRate 0.0039   Epoch: 16   Global Step: 199450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:37:09,601-Speed 2959.97 samples/sec   Loss 2.0368   LearningRate 0.0039   Epoch: 16   Global Step: 199460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:37:13,037-Speed 2981.59 samples/sec   Loss 2.0411   LearningRate 0.0039   Epoch: 16   Global Step: 199470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:37:16,517-Speed 2943.47 samples/sec   Loss 2.0208   LearningRate 0.0039   Epoch: 16   Global Step: 199480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:37:19,879-Speed 3046.16 samples/sec   Loss 2.0172   LearningRate 0.0039   Epoch: 16   Global Step: 199490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:37:23,291-Speed 3002.48 samples/sec   Loss 2.0493   LearningRate 0.0039   Epoch: 16   Global Step: 199500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:37:26,694-Speed 3009.59 samples/sec   Loss 1.9517   LearningRate 0.0039   Epoch: 16   Global Step: 199510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:37:30,082-Speed 3023.34 samples/sec   Loss 2.0129   LearningRate 0.0039   Epoch: 16   Global Step: 199520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:37:33,493-Speed 3002.48 samples/sec   Loss 2.0699   LearningRate 0.0039   Epoch: 16   Global Step: 199530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:37:36,886-Speed 3019.43 samples/sec   Loss 2.0195   LearningRate 0.0039   Epoch: 16   Global Step: 199540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:37:40,315-Speed 2987.04 samples/sec   Loss 2.0992   LearningRate 0.0039   Epoch: 16   Global Step: 199550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:37:43,670-Speed 3053.41 samples/sec   Loss 2.1023   LearningRate 0.0039   Epoch: 16   Global Step: 199560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:37:47,078-Speed 3005.73 samples/sec   Loss 2.0214   LearningRate 0.0039   Epoch: 16   Global Step: 199570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:37:50,508-Speed 2985.81 samples/sec   Loss 2.0878   LearningRate 0.0039   Epoch: 16   Global Step: 199580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:37:53,942-Speed 2982.84 samples/sec   Loss 2.0405   LearningRate 0.0039   Epoch: 16   Global Step: 199590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:37:57,411-Speed 2952.91 samples/sec   Loss 2.0722   LearningRate 0.0039   Epoch: 16   Global Step: 199600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:38:00,795-Speed 3026.63 samples/sec   Loss 2.0222   LearningRate 0.0039   Epoch: 16   Global Step: 199610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:38:04,185-Speed 3021.34 samples/sec   Loss 2.0021   LearningRate 0.0039   Epoch: 16   Global Step: 199620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:38:07,574-Speed 3022.70 samples/sec   Loss 1.9513   LearningRate 0.0039   Epoch: 16   Global Step: 199630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:38:10,927-Speed 3055.15 samples/sec   Loss 2.0236   LearningRate 0.0039   Epoch: 16   Global Step: 199640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:38:14,251-Speed 3081.29 samples/sec   Loss 2.0533   LearningRate 0.0039   Epoch: 16   Global Step: 199650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:38:17,651-Speed 3012.67 samples/sec   Loss 1.9919   LearningRate 0.0039   Epoch: 16   Global Step: 199660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:38:21,050-Speed 3013.35 samples/sec   Loss 2.0023   LearningRate 0.0039   Epoch: 16   Global Step: 199670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:38:24,488-Speed 2979.12 samples/sec   Loss 2.0394   LearningRate 0.0038   Epoch: 16   Global Step: 199680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:38:27,815-Speed 3078.35 samples/sec   Loss 2.0246   LearningRate 0.0038   Epoch: 16   Global Step: 199690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:38:31,191-Speed 3034.20 samples/sec   Loss 1.9500   LearningRate 0.0038   Epoch: 16   Global Step: 199700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:38:34,556-Speed 3044.14 samples/sec   Loss 2.0656   LearningRate 0.0038   Epoch: 16   Global Step: 199710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:38:37,892-Speed 3070.32 samples/sec   Loss 2.0520   LearningRate 0.0038   Epoch: 16   Global Step: 199720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:38:41,296-Speed 3009.70 samples/sec   Loss 2.0573   LearningRate 0.0038   Epoch: 16   Global Step: 199730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:38:44,682-Speed 3024.86 samples/sec   Loss 2.0814   LearningRate 0.0038   Epoch: 16   Global Step: 199740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:38:48,033-Speed 3056.27 samples/sec   Loss 2.0479   LearningRate 0.0038   Epoch: 16   Global Step: 199750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:38:51,470-Speed 2980.17 samples/sec   Loss 2.0366   LearningRate 0.0038   Epoch: 16   Global Step: 199760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:38:54,845-Speed 3034.94 samples/sec   Loss 2.0662   LearningRate 0.0038   Epoch: 16   Global Step: 199770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:38:58,148-Speed 3101.25 samples/sec   Loss 1.9697   LearningRate 0.0038   Epoch: 16   Global Step: 199780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:39:01,478-Speed 3075.84 samples/sec   Loss 2.0421   LearningRate 0.0038   Epoch: 16   Global Step: 199790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:39:04,878-Speed 3013.57 samples/sec   Loss 2.0197   LearningRate 0.0038   Epoch: 16   Global Step: 199800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:39:08,225-Speed 3059.98 samples/sec   Loss 2.0816   LearningRate 0.0038   Epoch: 16   Global Step: 199810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:39:11,635-Speed 3004.64 samples/sec   Loss 2.0286   LearningRate 0.0038   Epoch: 16   Global Step: 199820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:39:14,992-Speed 3050.69 samples/sec   Loss 2.0586   LearningRate 0.0038   Epoch: 16   Global Step: 199830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:39:18,399-Speed 3006.72 samples/sec   Loss 2.1058   LearningRate 0.0038   Epoch: 16   Global Step: 199840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:39:21,762-Speed 3045.50 samples/sec   Loss 2.1136   LearningRate 0.0038   Epoch: 16   Global Step: 199850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:39:25,179-Speed 2997.42 samples/sec   Loss 2.0337   LearningRate 0.0038   Epoch: 16   Global Step: 199860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:39:28,616-Speed 2980.46 samples/sec   Loss 2.0642   LearningRate 0.0038   Epoch: 16   Global Step: 199870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:39:32,021-Speed 3008.45 samples/sec   Loss 2.1094   LearningRate 0.0038   Epoch: 16   Global Step: 199880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:39:35,481-Speed 2960.01 samples/sec   Loss 2.0604   LearningRate 0.0038   Epoch: 16   Global Step: 199890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:39:38,915-Speed 2983.11 samples/sec   Loss 2.0755   LearningRate 0.0038   Epoch: 16   Global Step: 199900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:39:42,294-Speed 3030.72 samples/sec   Loss 1.9904   LearningRate 0.0038   Epoch: 16   Global Step: 199910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:39:45,713-Speed 2996.65 samples/sec   Loss 2.0467   LearningRate 0.0038   Epoch: 16   Global Step: 199920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:39:49,080-Speed 3041.71 samples/sec   Loss 1.9791   LearningRate 0.0038   Epoch: 16   Global Step: 199930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:39:52,477-Speed 3015.25 samples/sec   Loss 2.0004   LearningRate 0.0038   Epoch: 16   Global Step: 199940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:39:55,836-Speed 3049.61 samples/sec   Loss 2.0834   LearningRate 0.0038   Epoch: 16   Global Step: 199950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:39:59,155-Speed 3087.84 samples/sec   Loss 2.0576   LearningRate 0.0038   Epoch: 16   Global Step: 199960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:40:02,532-Speed 3032.68 samples/sec   Loss 2.0257   LearningRate 0.0038   Epoch: 16   Global Step: 199970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:40:05,932-Speed 3012.50 samples/sec   Loss 2.0987   LearningRate 0.0038   Epoch: 16   Global Step: 199980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:40:09,245-Speed 3092.14 samples/sec   Loss 2.0421   LearningRate 0.0038   Epoch: 16   Global Step: 199990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:40:12,640-Speed 3017.01 samples/sec   Loss 2.0218   LearningRate 0.0038   Epoch: 16   Global Step: 200000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:40:16,082-Speed 2975.31 samples/sec   Loss 1.9768   LearningRate 0.0038   Epoch: 16   Global Step: 200010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:40:19,476-Speed 3018.77 samples/sec   Loss 2.0719   LearningRate 0.0038   Epoch: 16   Global Step: 200020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:40:22,872-Speed 3015.53 samples/sec   Loss 2.0817   LearningRate 0.0038   Epoch: 16   Global Step: 200030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:40:26,245-Speed 3036.95 samples/sec   Loss 2.0549   LearningRate 0.0038   Epoch: 16   Global Step: 200040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:40:29,659-Speed 3000.06 samples/sec   Loss 2.0076   LearningRate 0.0038   Epoch: 16   Global Step: 200050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:40:33,013-Speed 3053.93 samples/sec   Loss 2.0926   LearningRate 0.0038   Epoch: 16   Global Step: 200060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:40:36,366-Speed 3055.09 samples/sec   Loss 2.1482   LearningRate 0.0038   Epoch: 16   Global Step: 200070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:40:39,821-Speed 2964.41 samples/sec   Loss 2.0936   LearningRate 0.0038   Epoch: 16   Global Step: 200080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:40:43,287-Speed 2955.94 samples/sec   Loss 2.0929   LearningRate 0.0038   Epoch: 16   Global Step: 200090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:40:46,714-Speed 2988.45 samples/sec   Loss 2.0717   LearningRate 0.0038   Epoch: 16   Global Step: 200100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:40:50,151-Speed 2980.19 samples/sec   Loss 2.0877   LearningRate 0.0038   Epoch: 16   Global Step: 200110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:40:53,579-Speed 2988.01 samples/sec   Loss 2.0738   LearningRate 0.0038   Epoch: 16   Global Step: 200120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:40:56,994-Speed 2999.64 samples/sec   Loss 2.0807   LearningRate 0.0038   Epoch: 16   Global Step: 200130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:41:00,394-Speed 3012.49 samples/sec   Loss 2.1095   LearningRate 0.0038   Epoch: 16   Global Step: 200140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:41:03,785-Speed 3021.14 samples/sec   Loss 2.1381   LearningRate 0.0038   Epoch: 16   Global Step: 200150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:41:07,152-Speed 3041.56 samples/sec   Loss 2.1115   LearningRate 0.0038   Epoch: 16   Global Step: 200160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:41:10,629-Speed 2946.21 samples/sec   Loss 2.1134   LearningRate 0.0038   Epoch: 16   Global Step: 200170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:41:14,077-Speed 2970.47 samples/sec   Loss 2.0748   LearningRate 0.0038   Epoch: 16   Global Step: 200180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:41:17,476-Speed 3013.75 samples/sec   Loss 2.1703   LearningRate 0.0038   Epoch: 16   Global Step: 200190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:41:20,879-Speed 3010.46 samples/sec   Loss 2.0707   LearningRate 0.0038   Epoch: 16   Global Step: 200200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:41:24,279-Speed 3011.85 samples/sec   Loss 2.0865   LearningRate 0.0038   Epoch: 16   Global Step: 200210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:41:27,678-Speed 3014.12 samples/sec   Loss 2.1714   LearningRate 0.0038   Epoch: 16   Global Step: 200220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:41:31,100-Speed 2992.89 samples/sec   Loss 2.0870   LearningRate 0.0038   Epoch: 16   Global Step: 200230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:41:34,465-Speed 3043.83 samples/sec   Loss 2.1486   LearningRate 0.0038   Epoch: 16   Global Step: 200240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:41:37,809-Speed 3062.89 samples/sec   Loss 2.0791   LearningRate 0.0038   Epoch: 16   Global Step: 200250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:41:41,216-Speed 3007.06 samples/sec   Loss 2.0994   LearningRate 0.0038   Epoch: 16   Global Step: 200260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:41:44,606-Speed 3021.00 samples/sec   Loss 2.0870   LearningRate 0.0038   Epoch: 16   Global Step: 200270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:41:47,958-Speed 3056.28 samples/sec   Loss 2.0420   LearningRate 0.0038   Epoch: 16   Global Step: 200280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:41:51,340-Speed 3028.26 samples/sec   Loss 2.0482   LearningRate 0.0038   Epoch: 16   Global Step: 200290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:41:54,707-Speed 3042.03 samples/sec   Loss 2.1250   LearningRate 0.0038   Epoch: 16   Global Step: 200300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:41:58,121-Speed 3000.47 samples/sec   Loss 2.1273   LearningRate 0.0038   Epoch: 16   Global Step: 200310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:42:01,513-Speed 3019.41 samples/sec   Loss 2.0940   LearningRate 0.0037   Epoch: 16   Global Step: 200320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:42:04,947-Speed 2983.28 samples/sec   Loss 2.1083   LearningRate 0.0037   Epoch: 16   Global Step: 200330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:42:08,415-Speed 2953.75 samples/sec   Loss 2.1022   LearningRate 0.0037   Epoch: 16   Global Step: 200340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:42:11,842-Speed 2988.26 samples/sec   Loss 2.0513   LearningRate 0.0037   Epoch: 16   Global Step: 200350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:42:15,183-Speed 3065.85 samples/sec   Loss 2.0879   LearningRate 0.0037   Epoch: 16   Global Step: 200360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:42:18,538-Speed 3053.67 samples/sec   Loss 2.0697   LearningRate 0.0037   Epoch: 16   Global Step: 200370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:42:21,863-Speed 3080.59 samples/sec   Loss 2.0448   LearningRate 0.0037   Epoch: 16   Global Step: 200380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:42:25,302-Speed 2978.27 samples/sec   Loss 2.1181   LearningRate 0.0037   Epoch: 16   Global Step: 200390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:42:28,723-Speed 2993.61 samples/sec   Loss 2.1443   LearningRate 0.0037   Epoch: 16   Global Step: 200400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:42:32,150-Speed 2989.05 samples/sec   Loss 2.0861   LearningRate 0.0037   Epoch: 16   Global Step: 200410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:42:35,534-Speed 3026.93 samples/sec   Loss 2.1159   LearningRate 0.0037   Epoch: 16   Global Step: 200420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:42:38,894-Speed 3048.72 samples/sec   Loss 2.1197   LearningRate 0.0037   Epoch: 16   Global Step: 200430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:42:42,304-Speed 3003.69 samples/sec   Loss 2.1232   LearningRate 0.0037   Epoch: 16   Global Step: 200440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:42:45,709-Speed 3008.42 samples/sec   Loss 2.0737   LearningRate 0.0037   Epoch: 16   Global Step: 200450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:42:49,101-Speed 3019.46 samples/sec   Loss 2.1253   LearningRate 0.0037   Epoch: 16   Global Step: 200460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:42:52,484-Speed 3028.11 samples/sec   Loss 2.1103   LearningRate 0.0037   Epoch: 16   Global Step: 200470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:42:55,874-Speed 3021.73 samples/sec   Loss 2.0558   LearningRate 0.0037   Epoch: 16   Global Step: 200480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:42:59,223-Speed 3058.74 samples/sec   Loss 2.0703   LearningRate 0.0037   Epoch: 16   Global Step: 200490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:43:02,546-Speed 3082.33 samples/sec   Loss 2.0839   LearningRate 0.0037   Epoch: 16   Global Step: 200500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:43:05,857-Speed 3093.80 samples/sec   Loss 2.0630   LearningRate 0.0037   Epoch: 16   Global Step: 200510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:43:09,327-Speed 2952.53 samples/sec   Loss 2.1244   LearningRate 0.0037   Epoch: 16   Global Step: 200520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:43:12,690-Speed 3045.96 samples/sec   Loss 2.0921   LearningRate 0.0037   Epoch: 16   Global Step: 200530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:43:16,017-Speed 3078.21 samples/sec   Loss 2.1231   LearningRate 0.0037   Epoch: 16   Global Step: 200540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:43:19,431-Speed 3000.82 samples/sec   Loss 2.1061   LearningRate 0.0037   Epoch: 16   Global Step: 200550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:43:22,860-Speed 2987.37 samples/sec   Loss 2.0934   LearningRate 0.0037   Epoch: 16   Global Step: 200560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:43:26,236-Speed 3033.62 samples/sec   Loss 2.0824   LearningRate 0.0037   Epoch: 16   Global Step: 200570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:43:29,598-Speed 3047.33 samples/sec   Loss 2.1720   LearningRate 0.0037   Epoch: 16   Global Step: 200580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:43:32,908-Speed 3094.51 samples/sec   Loss 2.1531   LearningRate 0.0037   Epoch: 16   Global Step: 200590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:43:36,270-Speed 3046.19 samples/sec   Loss 2.0939   LearningRate 0.0037   Epoch: 16   Global Step: 200600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:43:39,580-Speed 3094.92 samples/sec   Loss 2.0969   LearningRate 0.0037   Epoch: 16   Global Step: 200610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:43:42,975-Speed 3017.49 samples/sec   Loss 2.0823   LearningRate 0.0037   Epoch: 16   Global Step: 200620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 20:43:46,370-Speed 3016.73 samples/sec   Loss 2.0510   LearningRate 0.0037   Epoch: 16   Global Step: 200630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:43:49,722-Speed 3055.54 samples/sec   Loss 2.1134   LearningRate 0.0037   Epoch: 16   Global Step: 200640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:43:53,103-Speed 3030.26 samples/sec   Loss 2.1302   LearningRate 0.0037   Epoch: 16   Global Step: 200650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:43:56,475-Speed 3037.76 samples/sec   Loss 2.0757   LearningRate 0.0037   Epoch: 16   Global Step: 200660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:43:59,873-Speed 3014.48 samples/sec   Loss 2.1882   LearningRate 0.0037   Epoch: 16   Global Step: 200670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:44:03,257-Speed 3027.00 samples/sec   Loss 2.1721   LearningRate 0.0037   Epoch: 16   Global Step: 200680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:44:06,646-Speed 3021.71 samples/sec   Loss 2.0819   LearningRate 0.0037   Epoch: 16   Global Step: 200690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:44:10,057-Speed 3002.89 samples/sec   Loss 2.1781   LearningRate 0.0037   Epoch: 16   Global Step: 200700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:44:13,421-Speed 3045.10 samples/sec   Loss 2.1461   LearningRate 0.0037   Epoch: 16   Global Step: 200710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:44:16,735-Speed 3090.62 samples/sec   Loss 2.0947   LearningRate 0.0037   Epoch: 16   Global Step: 200720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:44:20,247-Speed 2917.27 samples/sec   Loss 2.1161   LearningRate 0.0037   Epoch: 16   Global Step: 200730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 20:44:23,669-Speed 2992.72 samples/sec   Loss 2.0671   LearningRate 0.0037   Epoch: 16   Global Step: 200740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:44:27,035-Speed 3043.00 samples/sec   Loss 2.1239   LearningRate 0.0037   Epoch: 16   Global Step: 200750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:44:30,416-Speed 3029.60 samples/sec   Loss 2.1102   LearningRate 0.0037   Epoch: 16   Global Step: 200760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:44:33,835-Speed 2995.53 samples/sec   Loss 2.1841   LearningRate 0.0037   Epoch: 16   Global Step: 200770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:44:37,190-Speed 3053.27 samples/sec   Loss 2.2120   LearningRate 0.0037   Epoch: 16   Global Step: 200780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:44:40,630-Speed 2977.61 samples/sec   Loss 2.1787   LearningRate 0.0037   Epoch: 16   Global Step: 200790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:44:44,054-Speed 2991.17 samples/sec   Loss 2.1792   LearningRate 0.0037   Epoch: 16   Global Step: 200800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 20:44:47,456-Speed 3011.38 samples/sec   Loss 2.1537   LearningRate 0.0037   Epoch: 16   Global Step: 200810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:44:50,856-Speed 3012.83 samples/sec   Loss 2.1613   LearningRate 0.0037   Epoch: 16   Global Step: 200820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:44:54,259-Speed 3009.53 samples/sec   Loss 2.1532   LearningRate 0.0037   Epoch: 16   Global Step: 200830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:44:57,580-Speed 3084.24 samples/sec   Loss 2.0772   LearningRate 0.0037   Epoch: 16   Global Step: 200840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:45:00,970-Speed 3021.30 samples/sec   Loss 2.1543   LearningRate 0.0037   Epoch: 16   Global Step: 200850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:45:04,317-Speed 3060.58 samples/sec   Loss 2.1163   LearningRate 0.0037   Epoch: 16   Global Step: 200860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:45:07,756-Speed 2978.43 samples/sec   Loss 2.1398   LearningRate 0.0037   Epoch: 16   Global Step: 200870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:45:11,085-Speed 3077.23 samples/sec   Loss 2.1340   LearningRate 0.0037   Epoch: 16   Global Step: 200880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:45:14,455-Speed 3039.51 samples/sec   Loss 2.1114   LearningRate 0.0037   Epoch: 16   Global Step: 200890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:45:17,789-Speed 3072.43 samples/sec   Loss 2.1708   LearningRate 0.0037   Epoch: 16   Global Step: 200900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:45:21,272-Speed 2941.19 samples/sec   Loss 2.1109   LearningRate 0.0037   Epoch: 16   Global Step: 200910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:45:24,653-Speed 3028.88 samples/sec   Loss 2.0922   LearningRate 0.0037   Epoch: 16   Global Step: 200920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:45:28,012-Speed 3050.06 samples/sec   Loss 2.1680   LearningRate 0.0037   Epoch: 16   Global Step: 200930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:45:31,386-Speed 3034.93 samples/sec   Loss 2.1245   LearningRate 0.0037   Epoch: 16   Global Step: 200940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:45:34,708-Speed 3084.55 samples/sec   Loss 2.1629   LearningRate 0.0037   Epoch: 16   Global Step: 200950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:45:38,039-Speed 3074.79 samples/sec   Loss 2.0895   LearningRate 0.0036   Epoch: 16   Global Step: 200960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:45:41,463-Speed 2991.83 samples/sec   Loss 2.2004   LearningRate 0.0036   Epoch: 16   Global Step: 200970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:45:44,871-Speed 3005.00 samples/sec   Loss 2.1089   LearningRate 0.0036   Epoch: 16   Global Step: 200980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:45:48,271-Speed 3012.59 samples/sec   Loss 2.1032   LearningRate 0.0036   Epoch: 16   Global Step: 200990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:45:51,621-Speed 3058.12 samples/sec   Loss 2.1500   LearningRate 0.0036   Epoch: 16   Global Step: 201000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:45:55,012-Speed 3020.15 samples/sec   Loss 2.1170   LearningRate 0.0036   Epoch: 16   Global Step: 201010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:45:58,385-Speed 3036.83 samples/sec   Loss 2.1493   LearningRate 0.0036   Epoch: 16   Global Step: 201020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:46:01,832-Speed 2971.70 samples/sec   Loss 2.1985   LearningRate 0.0036   Epoch: 16   Global Step: 201030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:46:05,192-Speed 3048.75 samples/sec   Loss 2.0868   LearningRate 0.0036   Epoch: 16   Global Step: 201040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:46:08,540-Speed 3059.01 samples/sec   Loss 2.1441   LearningRate 0.0036   Epoch: 16   Global Step: 201050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:46:11,945-Speed 3008.70 samples/sec   Loss 2.1198   LearningRate 0.0036   Epoch: 16   Global Step: 201060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:46:15,300-Speed 3052.88 samples/sec   Loss 2.1897   LearningRate 0.0036   Epoch: 16   Global Step: 201070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:46:18,670-Speed 3039.49 samples/sec   Loss 2.2597   LearningRate 0.0036   Epoch: 16   Global Step: 201080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:46:21,995-Speed 3083.15 samples/sec   Loss 2.1887   LearningRate 0.0036   Epoch: 16   Global Step: 201090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:46:25,439-Speed 2974.14 samples/sec   Loss 2.1616   LearningRate 0.0036   Epoch: 16   Global Step: 201100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:46:28,820-Speed 3029.70 samples/sec   Loss 2.2096   LearningRate 0.0036   Epoch: 16   Global Step: 201110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:46:32,256-Speed 2980.30 samples/sec   Loss 2.1545   LearningRate 0.0036   Epoch: 16   Global Step: 201120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:46:35,701-Speed 2973.16 samples/sec   Loss 2.1587   LearningRate 0.0036   Epoch: 16   Global Step: 201130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:46:39,126-Speed 2990.96 samples/sec   Loss 2.2308   LearningRate 0.0036   Epoch: 16   Global Step: 201140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:46:42,550-Speed 2991.82 samples/sec   Loss 2.2249   LearningRate 0.0036   Epoch: 16   Global Step: 201150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 20:46:45,970-Speed 2994.88 samples/sec   Loss 2.1648   LearningRate 0.0036   Epoch: 16   Global Step: 201160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 20:46:49,402-Speed 2984.64 samples/sec   Loss 2.1433   LearningRate 0.0036   Epoch: 16   Global Step: 201170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 20:46:52,834-Speed 2984.05 samples/sec   Loss 2.1517   LearningRate 0.0036   Epoch: 16   Global Step: 201180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:46:56,189-Speed 3053.38 samples/sec   Loss 2.2086   LearningRate 0.0036   Epoch: 16   Global Step: 201190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:46:59,580-Speed 3020.43 samples/sec   Loss 2.1109   LearningRate 0.0036   Epoch: 16   Global Step: 201200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:02,887-Speed 3097.54 samples/sec   Loss 2.1581   LearningRate 0.0036   Epoch: 16   Global Step: 201210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:06,223-Speed 3070.99 samples/sec   Loss 2.2045   LearningRate 0.0036   Epoch: 16   Global Step: 201220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:09,557-Speed 3072.33 samples/sec   Loss 2.1390   LearningRate 0.0036   Epoch: 16   Global Step: 201230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:12,901-Speed 3063.14 samples/sec   Loss 2.2256   LearningRate 0.0036   Epoch: 16   Global Step: 201240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:16,215-Speed 3090.68 samples/sec   Loss 2.1712   LearningRate 0.0036   Epoch: 16   Global Step: 201250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:19,628-Speed 3001.03 samples/sec   Loss 2.1499   LearningRate 0.0036   Epoch: 16   Global Step: 201260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:22,958-Speed 3076.79 samples/sec   Loss 2.1804   LearningRate 0.0036   Epoch: 16   Global Step: 201270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:26,326-Speed 3041.12 samples/sec   Loss 2.2307   LearningRate 0.0036   Epoch: 16   Global Step: 201280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 20:47:29,645-Speed 3086.04 samples/sec   Loss 2.1768   LearningRate 0.0036   Epoch: 16   Global Step: 201290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:33,031-Speed 3025.65 samples/sec   Loss 2.1432   LearningRate 0.0036   Epoch: 16   Global Step: 201300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:36,398-Speed 3042.36 samples/sec   Loss 2.2272   LearningRate 0.0036   Epoch: 16   Global Step: 201310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:39,802-Speed 3008.36 samples/sec   Loss 2.1723   LearningRate 0.0036   Epoch: 16   Global Step: 201320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:43,140-Speed 3069.13 samples/sec   Loss 2.1437   LearningRate 0.0036   Epoch: 16   Global Step: 201330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:46,539-Speed 3013.58 samples/sec   Loss 2.1771   LearningRate 0.0036   Epoch: 16   Global Step: 201340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:49,875-Speed 3069.99 samples/sec   Loss 2.2399   LearningRate 0.0036   Epoch: 16   Global Step: 201350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:53,308-Speed 2983.32 samples/sec   Loss 2.1917   LearningRate 0.0036   Epoch: 16   Global Step: 201360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:47:56,659-Speed 3056.94 samples/sec   Loss 2.1697   LearningRate 0.0036   Epoch: 16   Global Step: 201370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:48:00,075-Speed 2998.73 samples/sec   Loss 2.1543   LearningRate 0.0036   Epoch: 16   Global Step: 201380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:48:03,484-Speed 3004.00 samples/sec   Loss 2.1905   LearningRate 0.0036   Epoch: 16   Global Step: 201390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:48:06,853-Speed 3041.07 samples/sec   Loss 2.1844   LearningRate 0.0036   Epoch: 16   Global Step: 201400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:48:10,221-Speed 3041.48 samples/sec   Loss 2.2061   LearningRate 0.0036   Epoch: 16   Global Step: 201410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:48:13,582-Speed 3047.75 samples/sec   Loss 2.1623   LearningRate 0.0036   Epoch: 16   Global Step: 201420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:48:16,992-Speed 3003.75 samples/sec   Loss 2.0827   LearningRate 0.0036   Epoch: 16   Global Step: 201430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:48:20,325-Speed 3073.37 samples/sec   Loss 2.1936   LearningRate 0.0036   Epoch: 16   Global Step: 201440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:48:23,700-Speed 3034.52 samples/sec   Loss 2.1413   LearningRate 0.0036   Epoch: 16   Global Step: 201450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:48:27,051-Speed 3056.56 samples/sec   Loss 2.1733   LearningRate 0.0036   Epoch: 16   Global Step: 201460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:48:30,435-Speed 3026.49 samples/sec   Loss 2.1640   LearningRate 0.0036   Epoch: 16   Global Step: 201470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:48:33,832-Speed 3015.36 samples/sec   Loss 2.1319   LearningRate 0.0036   Epoch: 16   Global Step: 201480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:48:37,304-Speed 2950.14 samples/sec   Loss 2.1952   LearningRate 0.0036   Epoch: 16   Global Step: 201490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:48:40,786-Speed 2941.74 samples/sec   Loss 2.1448   LearningRate 0.0036   Epoch: 16   Global Step: 201500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:48:44,191-Speed 3008.59 samples/sec   Loss 2.1914   LearningRate 0.0036   Epoch: 16   Global Step: 201510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:48:47,513-Speed 3083.47 samples/sec   Loss 2.1784   LearningRate 0.0036   Epoch: 16   Global Step: 201520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:48:50,883-Speed 3038.99 samples/sec   Loss 2.2367   LearningRate 0.0036   Epoch: 16   Global Step: 201530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:48:54,271-Speed 3023.41 samples/sec   Loss 2.1502   LearningRate 0.0036   Epoch: 16   Global Step: 201540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:48:57,744-Speed 2949.76 samples/sec   Loss 2.1156   LearningRate 0.0036   Epoch: 16   Global Step: 201550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:49:01,097-Speed 3054.89 samples/sec   Loss 2.2150   LearningRate 0.0036   Epoch: 16   Global Step: 201560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:49:04,492-Speed 3017.08 samples/sec   Loss 2.1812   LearningRate 0.0036   Epoch: 16   Global Step: 201570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 20:49:07,830-Speed 3067.96 samples/sec   Loss 2.0595   LearningRate 0.0036   Epoch: 16   Global Step: 201580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 20:49:11,272-Speed 2975.92 samples/sec   Loss 2.2335   LearningRate 0.0036   Epoch: 16   Global Step: 201590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:49:14,708-Speed 2981.10 samples/sec   Loss 2.2077   LearningRate 0.0036   Epoch: 16   Global Step: 201600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:49:18,155-Speed 2971.90 samples/sec   Loss 2.1570   LearningRate 0.0036   Epoch: 16   Global Step: 201610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:49:21,556-Speed 3011.70 samples/sec   Loss 2.1811   LearningRate 0.0035   Epoch: 16   Global Step: 201620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:49:24,973-Speed 2997.51 samples/sec   Loss 2.1569   LearningRate 0.0035   Epoch: 16   Global Step: 201630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:49:28,378-Speed 3008.38 samples/sec   Loss 2.1876   LearningRate 0.0035   Epoch: 16   Global Step: 201640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:49:31,728-Speed 3057.95 samples/sec   Loss 2.1890   LearningRate 0.0035   Epoch: 16   Global Step: 201650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:49:35,138-Speed 3003.43 samples/sec   Loss 2.1629   LearningRate 0.0035   Epoch: 16   Global Step: 201660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:49:38,633-Speed 2930.40 samples/sec   Loss 2.2146   LearningRate 0.0035   Epoch: 16   Global Step: 201670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:49:42,062-Speed 2987.65 samples/sec   Loss 2.1398   LearningRate 0.0035   Epoch: 16   Global Step: 201680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:49:45,417-Speed 3052.45 samples/sec   Loss 2.1294   LearningRate 0.0035   Epoch: 16   Global Step: 201690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:49:48,822-Speed 3008.77 samples/sec   Loss 2.1987   LearningRate 0.0035   Epoch: 16   Global Step: 201700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:49:52,199-Speed 3032.90 samples/sec   Loss 2.1773   LearningRate 0.0035   Epoch: 16   Global Step: 201710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:49:55,642-Speed 2975.04 samples/sec   Loss 2.2095   LearningRate 0.0035   Epoch: 16   Global Step: 201720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:49:59,067-Speed 2990.40 samples/sec   Loss 2.1701   LearningRate 0.0035   Epoch: 16   Global Step: 201730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:50:02,425-Speed 3049.69 samples/sec   Loss 2.1066   LearningRate 0.0035   Epoch: 16   Global Step: 201740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:50:05,753-Speed 3077.94 samples/sec   Loss 2.1955   LearningRate 0.0035   Epoch: 16   Global Step: 201750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:50:09,076-Speed 3083.41 samples/sec   Loss 2.1808   LearningRate 0.0035   Epoch: 16   Global Step: 201760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:50:12,470-Speed 3017.68 samples/sec   Loss 2.1606   LearningRate 0.0035   Epoch: 16   Global Step: 201770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:50:15,901-Speed 2985.39 samples/sec   Loss 2.1523   LearningRate 0.0035   Epoch: 16   Global Step: 201780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:50:19,299-Speed 3014.13 samples/sec   Loss 2.1087   LearningRate 0.0035   Epoch: 16   Global Step: 201790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:50:22,677-Speed 3032.33 samples/sec   Loss 2.2575   LearningRate 0.0035   Epoch: 16   Global Step: 201800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:50:26,095-Speed 2997.04 samples/sec   Loss 2.0748   LearningRate 0.0035   Epoch: 16   Global Step: 201810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:50:29,559-Speed 2956.62 samples/sec   Loss 2.2106   LearningRate 0.0035   Epoch: 16   Global Step: 201820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:50:32,908-Speed 3059.04 samples/sec   Loss 2.2711   LearningRate 0.0035   Epoch: 16   Global Step: 201830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:50:36,274-Speed 3042.74 samples/sec   Loss 2.2350   LearningRate 0.0035   Epoch: 16   Global Step: 201840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:50:39,684-Speed 3003.71 samples/sec   Loss 2.1919   LearningRate 0.0035   Epoch: 16   Global Step: 201850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:50:43,063-Speed 3031.04 samples/sec   Loss 2.2283   LearningRate 0.0035   Epoch: 16   Global Step: 201860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:50:46,400-Speed 3069.72 samples/sec   Loss 2.1583   LearningRate 0.0035   Epoch: 16   Global Step: 201870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:50:49,738-Speed 3069.06 samples/sec   Loss 2.2094   LearningRate 0.0035   Epoch: 16   Global Step: 201880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:50:53,065-Speed 3078.22 samples/sec   Loss 2.1478   LearningRate 0.0035   Epoch: 16   Global Step: 201890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:50:56,430-Speed 3044.29 samples/sec   Loss 2.1823   LearningRate 0.0035   Epoch: 16   Global Step: 201900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:50:59,857-Speed 2988.33 samples/sec   Loss 2.1749   LearningRate 0.0035   Epoch: 16   Global Step: 201910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:51:03,266-Speed 3004.54 samples/sec   Loss 2.2529   LearningRate 0.0035   Epoch: 16   Global Step: 201920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:51:06,637-Speed 3038.28 samples/sec   Loss 2.2070   LearningRate 0.0035   Epoch: 16   Global Step: 201930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:51:10,041-Speed 3009.45 samples/sec   Loss 2.1572   LearningRate 0.0035   Epoch: 16   Global Step: 201940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:51:13,374-Speed 3073.55 samples/sec   Loss 2.1354   LearningRate 0.0035   Epoch: 16   Global Step: 201950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:51:16,719-Speed 3061.20 samples/sec   Loss 2.2474   LearningRate 0.0035   Epoch: 16   Global Step: 201960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:51:20,075-Speed 3053.08 samples/sec   Loss 2.2302   LearningRate 0.0035   Epoch: 16   Global Step: 201970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:51:23,446-Speed 3038.84 samples/sec   Loss 2.2096   LearningRate 0.0035   Epoch: 16   Global Step: 201980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:51:26,804-Speed 3049.79 samples/sec   Loss 2.1778   LearningRate 0.0035   Epoch: 16   Global Step: 201990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:51:30,147-Speed 3063.81 samples/sec   Loss 2.1592   LearningRate 0.0035   Epoch: 16   Global Step: 202000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:51:33,458-Speed 3094.09 samples/sec   Loss 2.2727   LearningRate 0.0035   Epoch: 16   Global Step: 202010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:51:36,836-Speed 3031.78 samples/sec   Loss 2.2583   LearningRate 0.0035   Epoch: 16   Global Step: 202020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:51:40,165-Speed 3076.63 samples/sec   Loss 2.2034   LearningRate 0.0035   Epoch: 16   Global Step: 202030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:51:43,505-Speed 3067.30 samples/sec   Loss 2.2367   LearningRate 0.0035   Epoch: 16   Global Step: 202040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:51:46,930-Speed 2990.26 samples/sec   Loss 2.1453   LearningRate 0.0035   Epoch: 16   Global Step: 202050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:51:50,292-Speed 3046.71 samples/sec   Loss 2.2167   LearningRate 0.0035   Epoch: 16   Global Step: 202060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:51:53,666-Speed 3035.61 samples/sec   Loss 2.2735   LearningRate 0.0035   Epoch: 16   Global Step: 202070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:51:56,975-Speed 3095.00 samples/sec   Loss 2.2506   LearningRate 0.0035   Epoch: 16   Global Step: 202080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 20:52:00,337-Speed 3046.79 samples/sec   Loss 2.1944   LearningRate 0.0035   Epoch: 16   Global Step: 202090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:03,729-Speed 3020.21 samples/sec   Loss 2.2320   LearningRate 0.0035   Epoch: 16   Global Step: 202100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:07,062-Speed 3073.05 samples/sec   Loss 2.2610   LearningRate 0.0035   Epoch: 16   Global Step: 202110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:10,477-Speed 2999.26 samples/sec   Loss 2.2391   LearningRate 0.0035   Epoch: 16   Global Step: 202120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:13,820-Speed 3063.83 samples/sec   Loss 2.2527   LearningRate 0.0035   Epoch: 16   Global Step: 202130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:17,234-Speed 2999.99 samples/sec   Loss 2.1815   LearningRate 0.0035   Epoch: 16   Global Step: 202140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:20,624-Speed 3021.66 samples/sec   Loss 2.1651   LearningRate 0.0035   Epoch: 16   Global Step: 202150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:24,038-Speed 3000.31 samples/sec   Loss 2.2155   LearningRate 0.0035   Epoch: 16   Global Step: 202160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:27,404-Speed 3042.57 samples/sec   Loss 2.2060   LearningRate 0.0035   Epoch: 16   Global Step: 202170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:30,752-Speed 3059.90 samples/sec   Loss 2.2000   LearningRate 0.0035   Epoch: 16   Global Step: 202180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:34,067-Speed 3089.33 samples/sec   Loss 2.2314   LearningRate 0.0035   Epoch: 16   Global Step: 202190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 20:52:37,380-Speed 3091.60 samples/sec   Loss 2.3126   LearningRate 0.0035   Epoch: 16   Global Step: 202200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:40,736-Speed 3052.60 samples/sec   Loss 2.2413   LearningRate 0.0035   Epoch: 16   Global Step: 202210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:44,080-Speed 3063.04 samples/sec   Loss 2.2166   LearningRate 0.0035   Epoch: 16   Global Step: 202220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:47,427-Speed 3060.00 samples/sec   Loss 2.1653   LearningRate 0.0035   Epoch: 16   Global Step: 202230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:50,801-Speed 3036.30 samples/sec   Loss 2.2542   LearningRate 0.0035   Epoch: 16   Global Step: 202240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:54,157-Speed 3053.12 samples/sec   Loss 2.2191   LearningRate 0.0035   Epoch: 16   Global Step: 202250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:52:57,498-Speed 3065.49 samples/sec   Loss 2.2541   LearningRate 0.0035   Epoch: 16   Global Step: 202260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:00,970-Speed 2950.14 samples/sec   Loss 2.1921   LearningRate 0.0035   Epoch: 16   Global Step: 202270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:04,380-Speed 3003.87 samples/sec   Loss 2.2345   LearningRate 0.0034   Epoch: 16   Global Step: 202280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:07,768-Speed 3022.68 samples/sec   Loss 2.2270   LearningRate 0.0034   Epoch: 16   Global Step: 202290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:11,131-Speed 3045.34 samples/sec   Loss 2.1980   LearningRate 0.0034   Epoch: 16   Global Step: 202300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 20:53:14,478-Speed 3061.05 samples/sec   Loss 2.1950   LearningRate 0.0034   Epoch: 16   Global Step: 202310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:17,900-Speed 2993.48 samples/sec   Loss 2.2751   LearningRate 0.0034   Epoch: 16   Global Step: 202320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:21,278-Speed 3031.96 samples/sec   Loss 2.1791   LearningRate 0.0034   Epoch: 16   Global Step: 202330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:24,635-Speed 3051.41 samples/sec   Loss 2.2322   LearningRate 0.0034   Epoch: 16   Global Step: 202340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:28,044-Speed 3004.08 samples/sec   Loss 2.2549   LearningRate 0.0034   Epoch: 16   Global Step: 202350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:31,516-Speed 2950.62 samples/sec   Loss 2.1911   LearningRate 0.0034   Epoch: 16   Global Step: 202360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:34,971-Speed 2964.53 samples/sec   Loss 2.1953   LearningRate 0.0034   Epoch: 16   Global Step: 202370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:38,455-Speed 2939.89 samples/sec   Loss 2.1754   LearningRate 0.0034   Epoch: 16   Global Step: 202380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:41,878-Speed 2993.09 samples/sec   Loss 2.2417   LearningRate 0.0034   Epoch: 16   Global Step: 202390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:45,329-Speed 2968.87 samples/sec   Loss 2.1764   LearningRate 0.0034   Epoch: 16   Global Step: 202400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:48,706-Speed 3032.64 samples/sec   Loss 2.3088   LearningRate 0.0034   Epoch: 16   Global Step: 202410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:52,082-Speed 3034.52 samples/sec   Loss 2.2863   LearningRate 0.0034   Epoch: 16   Global Step: 202420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:55,420-Speed 3068.37 samples/sec   Loss 2.2238   LearningRate 0.0034   Epoch: 16   Global Step: 202430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:53:58,782-Speed 3046.37 samples/sec   Loss 2.2027   LearningRate 0.0034   Epoch: 16   Global Step: 202440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:02,115-Speed 3073.19 samples/sec   Loss 2.2259   LearningRate 0.0034   Epoch: 16   Global Step: 202450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:05,455-Speed 3066.12 samples/sec   Loss 2.1762   LearningRate 0.0034   Epoch: 16   Global Step: 202460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:08,836-Speed 3030.04 samples/sec   Loss 2.2283   LearningRate 0.0034   Epoch: 16   Global Step: 202470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:12,181-Speed 3062.70 samples/sec   Loss 2.2003   LearningRate 0.0034   Epoch: 16   Global Step: 202480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:15,531-Speed 3057.40 samples/sec   Loss 2.2755   LearningRate 0.0034   Epoch: 16   Global Step: 202490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:18,898-Speed 3041.93 samples/sec   Loss 2.2732   LearningRate 0.0034   Epoch: 16   Global Step: 202500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:22,245-Speed 3060.58 samples/sec   Loss 2.2663   LearningRate 0.0034   Epoch: 16   Global Step: 202510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 20:54:25,593-Speed 3059.94 samples/sec   Loss 2.2238   LearningRate 0.0034   Epoch: 16   Global Step: 202520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:28,936-Speed 3063.58 samples/sec   Loss 2.1694   LearningRate 0.0034   Epoch: 16   Global Step: 202530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:32,327-Speed 3021.02 samples/sec   Loss 2.2276   LearningRate 0.0034   Epoch: 16   Global Step: 202540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:35,755-Speed 2987.72 samples/sec   Loss 2.1924   LearningRate 0.0034   Epoch: 16   Global Step: 202550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:39,102-Speed 3059.92 samples/sec   Loss 2.2344   LearningRate 0.0034   Epoch: 16   Global Step: 202560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:42,591-Speed 2935.85 samples/sec   Loss 2.1872   LearningRate 0.0034   Epoch: 16   Global Step: 202570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:45,976-Speed 3026.05 samples/sec   Loss 2.2105   LearningRate 0.0034   Epoch: 16   Global Step: 202580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:49,376-Speed 3012.48 samples/sec   Loss 2.2930   LearningRate 0.0034   Epoch: 16   Global Step: 202590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:52,790-Speed 3001.05 samples/sec   Loss 2.2242   LearningRate 0.0034   Epoch: 16   Global Step: 202600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:56,185-Speed 3016.88 samples/sec   Loss 2.2647   LearningRate 0.0034   Epoch: 16   Global Step: 202610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:54:59,536-Speed 3056.61 samples/sec   Loss 2.2278   LearningRate 0.0034   Epoch: 16   Global Step: 202620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:55:02,947-Speed 3003.02 samples/sec   Loss 2.2135   LearningRate 0.0034   Epoch: 16   Global Step: 202630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:55:06,365-Speed 2997.36 samples/sec   Loss 2.2595   LearningRate 0.0034   Epoch: 16   Global Step: 202640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:55:09,716-Speed 3055.94 samples/sec   Loss 2.2120   LearningRate 0.0034   Epoch: 16   Global Step: 202650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:55:13,128-Speed 3002.89 samples/sec   Loss 2.2163   LearningRate 0.0034   Epoch: 16   Global Step: 202660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:55:16,429-Speed 3102.47 samples/sec   Loss 2.3007   LearningRate 0.0034   Epoch: 16   Global Step: 202670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:55:19,759-Speed 3076.23 samples/sec   Loss 2.1757   LearningRate 0.0034   Epoch: 16   Global Step: 202680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:55:23,072-Speed 3092.32 samples/sec   Loss 2.1885   LearningRate 0.0034   Epoch: 16   Global Step: 202690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:55:26,409-Speed 3069.22 samples/sec   Loss 2.2708   LearningRate 0.0034   Epoch: 16   Global Step: 202700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:55:29,844-Speed 2982.17 samples/sec   Loss 2.1881   LearningRate 0.0034   Epoch: 16   Global Step: 202710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:55:33,317-Speed 2948.96 samples/sec   Loss 2.1900   LearningRate 0.0034   Epoch: 16   Global Step: 202720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:55:36,725-Speed 3005.68 samples/sec   Loss 2.2357   LearningRate 0.0034   Epoch: 16   Global Step: 202730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:55:40,077-Speed 3055.62 samples/sec   Loss 2.2567   LearningRate 0.0034   Epoch: 16   Global Step: 202740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:55:43,484-Speed 3007.44 samples/sec   Loss 2.2734   LearningRate 0.0034   Epoch: 16   Global Step: 202750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:55:46,851-Speed 3041.77 samples/sec   Loss 2.2727   LearningRate 0.0034   Epoch: 16   Global Step: 202760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:55:50,191-Speed 3066.97 samples/sec   Loss 2.2023   LearningRate 0.0034   Epoch: 16   Global Step: 202770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:55:53,504-Speed 3091.29 samples/sec   Loss 2.2346   LearningRate 0.0034   Epoch: 16   Global Step: 202780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:55:56,881-Speed 3035.04 samples/sec   Loss 2.2674   LearningRate 0.0034   Epoch: 16   Global Step: 202790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:56:00,380-Speed 2927.08 samples/sec   Loss 2.2370   LearningRate 0.0034   Epoch: 16   Global Step: 202800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:56:03,770-Speed 3021.34 samples/sec   Loss 2.2076   LearningRate 0.0034   Epoch: 16   Global Step: 202810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:56:07,134-Speed 3044.76 samples/sec   Loss 2.2768   LearningRate 0.0034   Epoch: 16   Global Step: 202820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:56:10,535-Speed 3012.43 samples/sec   Loss 2.2010   LearningRate 0.0034   Epoch: 16   Global Step: 202830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:56:13,853-Speed 3086.95 samples/sec   Loss 2.2178   LearningRate 0.0034   Epoch: 16   Global Step: 202840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:56:17,173-Speed 3085.47 samples/sec   Loss 2.2290   LearningRate 0.0034   Epoch: 16   Global Step: 202850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:56:20,555-Speed 3028.58 samples/sec   Loss 2.2131   LearningRate 0.0034   Epoch: 16   Global Step: 202860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:56:23,926-Speed 3037.73 samples/sec   Loss 2.2185   LearningRate 0.0034   Epoch: 16   Global Step: 202870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:56:27,282-Speed 3052.84 samples/sec   Loss 2.2385   LearningRate 0.0034   Epoch: 16   Global Step: 202880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:56:30,690-Speed 3004.98 samples/sec   Loss 2.2856   LearningRate 0.0034   Epoch: 16   Global Step: 202890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:56:34,020-Speed 3076.07 samples/sec   Loss 2.2628   LearningRate 0.0034   Epoch: 16   Global Step: 202900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:56:37,455-Speed 2982.03 samples/sec   Loss 2.2148   LearningRate 0.0034   Epoch: 16   Global Step: 202910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:56:40,788-Speed 3073.01 samples/sec   Loss 2.2594   LearningRate 0.0034   Epoch: 16   Global Step: 202920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:56:44,116-Speed 3078.16 samples/sec   Loss 2.2441   LearningRate 0.0034   Epoch: 16   Global Step: 202930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:56:47,487-Speed 3037.92 samples/sec   Loss 2.2292   LearningRate 0.0034   Epoch: 16   Global Step: 202940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:56:50,888-Speed 3011.85 samples/sec   Loss 2.2753   LearningRate 0.0034   Epoch: 16   Global Step: 202950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:56:54,236-Speed 3059.55 samples/sec   Loss 2.1611   LearningRate 0.0033   Epoch: 16   Global Step: 202960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 20:56:57,595-Speed 3049.16 samples/sec   Loss 2.1904   LearningRate 0.0033   Epoch: 16   Global Step: 202970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:57:00,999-Speed 3009.79 samples/sec   Loss 2.2367   LearningRate 0.0033   Epoch: 16   Global Step: 202980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:57:04,458-Speed 2960.98 samples/sec   Loss 2.1506   LearningRate 0.0033   Epoch: 16   Global Step: 202990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:57:08,496-Speed 2536.81 samples/sec   Loss 2.2850   LearningRate 0.0033   Epoch: 16   Global Step: 203000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:57:11,869-Speed 3036.71 samples/sec   Loss 2.3546   LearningRate 0.0033   Epoch: 16   Global Step: 203010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:57:15,299-Speed 2985.67 samples/sec   Loss 2.3189   LearningRate 0.0033   Epoch: 16   Global Step: 203020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:57:20,535-Speed 1955.99 samples/sec   Loss 2.2591   LearningRate 0.0033   Epoch: 16   Global Step: 203030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:57:23,980-Speed 2973.73 samples/sec   Loss 2.2514   LearningRate 0.0033   Epoch: 16   Global Step: 203040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:57:27,472-Speed 2932.86 samples/sec   Loss 2.2773   LearningRate 0.0033   Epoch: 16   Global Step: 203050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:57:31,361-Speed 2633.73 samples/sec   Loss 2.2972   LearningRate 0.0033   Epoch: 16   Global Step: 203060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:57:34,723-Speed 3046.30 samples/sec   Loss 2.2264   LearningRate 0.0033   Epoch: 16   Global Step: 203070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:57:38,107-Speed 3026.91 samples/sec   Loss 2.2305   LearningRate 0.0033   Epoch: 16   Global Step: 203080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:57:41,454-Speed 3060.54 samples/sec   Loss 2.3494   LearningRate 0.0033   Epoch: 16   Global Step: 203090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:57:44,867-Speed 3001.12 samples/sec   Loss 2.3066   LearningRate 0.0033   Epoch: 16   Global Step: 203100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:57:48,222-Speed 3053.50 samples/sec   Loss 2.2434   LearningRate 0.0033   Epoch: 16   Global Step: 203110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:57:51,551-Speed 3076.25 samples/sec   Loss 2.2290   LearningRate 0.0033   Epoch: 16   Global Step: 203120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:57:54,951-Speed 3013.09 samples/sec   Loss 2.2220   LearningRate 0.0033   Epoch: 16   Global Step: 203130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:57:58,364-Speed 3001.26 samples/sec   Loss 2.2013   LearningRate 0.0033   Epoch: 16   Global Step: 203140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:58:01,777-Speed 3000.20 samples/sec   Loss 2.2211   LearningRate 0.0033   Epoch: 16   Global Step: 203150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 20:58:05,158-Speed 3031.13 samples/sec   Loss 2.2787   LearningRate 0.0033   Epoch: 16   Global Step: 203160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:08,514-Speed 3052.14 samples/sec   Loss 2.2198   LearningRate 0.0033   Epoch: 16   Global Step: 203170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:11,873-Speed 3049.22 samples/sec   Loss 2.2268   LearningRate 0.0033   Epoch: 16   Global Step: 203180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:15,199-Speed 3079.59 samples/sec   Loss 2.2367   LearningRate 0.0033   Epoch: 16   Global Step: 203190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:18,573-Speed 3035.97 samples/sec   Loss 2.2998   LearningRate 0.0033   Epoch: 16   Global Step: 203200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:21,937-Speed 3045.11 samples/sec   Loss 2.2873   LearningRate 0.0033   Epoch: 16   Global Step: 203210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:25,313-Speed 3033.90 samples/sec   Loss 2.2493   LearningRate 0.0033   Epoch: 16   Global Step: 203220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:28,723-Speed 3004.06 samples/sec   Loss 2.2835   LearningRate 0.0033   Epoch: 16   Global Step: 203230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:32,091-Speed 3040.73 samples/sec   Loss 2.1393   LearningRate 0.0033   Epoch: 16   Global Step: 203240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:35,479-Speed 3023.77 samples/sec   Loss 2.3126   LearningRate 0.0033   Epoch: 16   Global Step: 203250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:38,842-Speed 3044.99 samples/sec   Loss 2.2546   LearningRate 0.0033   Epoch: 16   Global Step: 203260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:42,204-Speed 3047.16 samples/sec   Loss 2.3158   LearningRate 0.0033   Epoch: 16   Global Step: 203270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:45,552-Speed 3059.14 samples/sec   Loss 2.1863   LearningRate 0.0033   Epoch: 16   Global Step: 203280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:48,954-Speed 3011.06 samples/sec   Loss 2.2645   LearningRate 0.0033   Epoch: 16   Global Step: 203290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:52,295-Speed 3065.52 samples/sec   Loss 2.2044   LearningRate 0.0033   Epoch: 16   Global Step: 203300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:55,621-Speed 3079.46 samples/sec   Loss 2.3247   LearningRate 0.0033   Epoch: 16   Global Step: 203310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:58:59,031-Speed 3004.02 samples/sec   Loss 2.2854   LearningRate 0.0033   Epoch: 16   Global Step: 203320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:02,497-Speed 2955.67 samples/sec   Loss 2.1985   LearningRate 0.0033   Epoch: 16   Global Step: 203330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:05,891-Speed 3017.52 samples/sec   Loss 2.2634   LearningRate 0.0033   Epoch: 16   Global Step: 203340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:09,268-Speed 3033.36 samples/sec   Loss 2.2918   LearningRate 0.0033   Epoch: 16   Global Step: 203350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:12,604-Speed 3070.83 samples/sec   Loss 2.3423   LearningRate 0.0033   Epoch: 16   Global Step: 203360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 20:59:15,923-Speed 3086.29 samples/sec   Loss 2.2751   LearningRate 0.0033   Epoch: 16   Global Step: 203370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:19,916-Speed 2564.52 samples/sec   Loss 2.2451   LearningRate 0.0033   Epoch: 16   Global Step: 203380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:23,296-Speed 3030.96 samples/sec   Loss 2.2175   LearningRate 0.0033   Epoch: 16   Global Step: 203390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:26,724-Speed 2987.76 samples/sec   Loss 2.3047   LearningRate 0.0033   Epoch: 16   Global Step: 203400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:30,784-Speed 2522.73 samples/sec   Loss 2.2690   LearningRate 0.0033   Epoch: 16   Global Step: 203410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:34,164-Speed 3030.55 samples/sec   Loss 2.3443   LearningRate 0.0033   Epoch: 16   Global Step: 203420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:37,539-Speed 3035.10 samples/sec   Loss 2.3136   LearningRate 0.0033   Epoch: 16   Global Step: 203430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:40,962-Speed 2992.37 samples/sec   Loss 2.2786   LearningRate 0.0033   Epoch: 16   Global Step: 203440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:44,438-Speed 2947.57 samples/sec   Loss 2.2525   LearningRate 0.0033   Epoch: 16   Global Step: 203450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:47,775-Speed 3069.23 samples/sec   Loss 2.2534   LearningRate 0.0033   Epoch: 16   Global Step: 203460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:51,151-Speed 3033.55 samples/sec   Loss 2.2947   LearningRate 0.0033   Epoch: 16   Global Step: 203470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 20:59:54,502-Speed 3056.51 samples/sec   Loss 2.3255   LearningRate 0.0033   Epoch: 16   Global Step: 203480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 20:59:57,879-Speed 3033.38 samples/sec   Loss 2.2443   LearningRate 0.0033   Epoch: 16   Global Step: 203490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:00:01,250-Speed 3039.41 samples/sec   Loss 2.2852   LearningRate 0.0033   Epoch: 16   Global Step: 203500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:00:04,645-Speed 3016.86 samples/sec   Loss 2.2903   LearningRate 0.0033   Epoch: 16   Global Step: 203510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:00:08,037-Speed 3019.78 samples/sec   Loss 2.2777   LearningRate 0.0033   Epoch: 16   Global Step: 203520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:00:11,445-Speed 3005.16 samples/sec   Loss 2.2203   LearningRate 0.0033   Epoch: 16   Global Step: 203530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:00:14,886-Speed 2977.02 samples/sec   Loss 2.2836   LearningRate 0.0033   Epoch: 16   Global Step: 203540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:00:18,327-Speed 2976.41 samples/sec   Loss 2.1860   LearningRate 0.0033   Epoch: 16   Global Step: 203550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:00:21,662-Speed 3071.62 samples/sec   Loss 2.2420   LearningRate 0.0033   Epoch: 16   Global Step: 203560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:00:25,105-Speed 2975.35 samples/sec   Loss 2.2817   LearningRate 0.0033   Epoch: 16   Global Step: 203570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:00:28,450-Speed 3062.05 samples/sec   Loss 2.2729   LearningRate 0.0033   Epoch: 16   Global Step: 203580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:00:31,776-Speed 3079.83 samples/sec   Loss 2.2516   LearningRate 0.0033   Epoch: 16   Global Step: 203590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:00:35,231-Speed 2963.90 samples/sec   Loss 2.2853   LearningRate 0.0033   Epoch: 16   Global Step: 203600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:00:38,609-Speed 3032.04 samples/sec   Loss 2.2297   LearningRate 0.0033   Epoch: 16   Global Step: 203610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:00:42,005-Speed 3016.89 samples/sec   Loss 2.3836   LearningRate 0.0033   Epoch: 16   Global Step: 203620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:00:45,414-Speed 3004.63 samples/sec   Loss 2.2157   LearningRate 0.0033   Epoch: 16   Global Step: 203630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:00:48,751-Speed 3068.89 samples/sec   Loss 2.2207   LearningRate 0.0032   Epoch: 16   Global Step: 203640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:00:52,074-Speed 3083.41 samples/sec   Loss 2.2642   LearningRate 0.0032   Epoch: 16   Global Step: 203650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:00:55,442-Speed 3041.05 samples/sec   Loss 2.3242   LearningRate 0.0032   Epoch: 16   Global Step: 203660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:00:58,796-Speed 3053.38 samples/sec   Loss 2.3477   LearningRate 0.0032   Epoch: 16   Global Step: 203670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:01:02,230-Speed 2983.04 samples/sec   Loss 2.2584   LearningRate 0.0032   Epoch: 16   Global Step: 203680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:01:05,627-Speed 3014.94 samples/sec   Loss 2.2506   LearningRate 0.0032   Epoch: 16   Global Step: 203690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:01:09,030-Speed 3009.89 samples/sec   Loss 2.2735   LearningRate 0.0032   Epoch: 16   Global Step: 203700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:01:12,399-Speed 3040.26 samples/sec   Loss 2.2137   LearningRate 0.0032   Epoch: 16   Global Step: 203710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:01:15,802-Speed 3010.35 samples/sec   Loss 2.2972   LearningRate 0.0032   Epoch: 16   Global Step: 203720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:01:19,177-Speed 3035.37 samples/sec   Loss 2.2631   LearningRate 0.0032   Epoch: 16   Global Step: 203730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:01:22,523-Speed 3061.12 samples/sec   Loss 2.2362   LearningRate 0.0032   Epoch: 16   Global Step: 203740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:01:25,879-Speed 3052.08 samples/sec   Loss 2.2891   LearningRate 0.0032   Epoch: 16   Global Step: 203750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:01:29,282-Speed 3010.68 samples/sec   Loss 2.2680   LearningRate 0.0032   Epoch: 16   Global Step: 203760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:01:32,725-Speed 2974.72 samples/sec   Loss 2.2515   LearningRate 0.0032   Epoch: 16   Global Step: 203770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:01:36,111-Speed 3024.94 samples/sec   Loss 2.2657   LearningRate 0.0032   Epoch: 16   Global Step: 203780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:01:39,466-Speed 3053.48 samples/sec   Loss 2.2424   LearningRate 0.0032   Epoch: 16   Global Step: 203790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:01:42,840-Speed 3036.32 samples/sec   Loss 2.3012   LearningRate 0.0032   Epoch: 16   Global Step: 203800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:01:46,237-Speed 3014.80 samples/sec   Loss 2.2584   LearningRate 0.0032   Epoch: 16   Global Step: 203810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:01:49,703-Speed 2955.45 samples/sec   Loss 2.2580   LearningRate 0.0032   Epoch: 16   Global Step: 203820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:01:53,108-Speed 3008.06 samples/sec   Loss 2.3344   LearningRate 0.0032   Epoch: 16   Global Step: 203830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:01:56,461-Speed 3056.12 samples/sec   Loss 2.3565   LearningRate 0.0032   Epoch: 16   Global Step: 203840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:01:59,834-Speed 3036.66 samples/sec   Loss 2.2516   LearningRate 0.0032   Epoch: 16   Global Step: 203850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:02:03,213-Speed 3031.61 samples/sec   Loss 2.3029   LearningRate 0.0032   Epoch: 16   Global Step: 203860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:02:06,546-Speed 3073.66 samples/sec   Loss 2.3014   LearningRate 0.0032   Epoch: 16   Global Step: 203870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:02:09,886-Speed 3065.83 samples/sec   Loss 2.3204   LearningRate 0.0032   Epoch: 16   Global Step: 203880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:02:13,308-Speed 2994.04 samples/sec   Loss 2.2665   LearningRate 0.0032   Epoch: 16   Global Step: 203890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:02:16,657-Speed 3057.96 samples/sec   Loss 2.2163   LearningRate 0.0032   Epoch: 16   Global Step: 203900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:02:20,077-Speed 2995.62 samples/sec   Loss 2.2080   LearningRate 0.0032   Epoch: 16   Global Step: 203910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:02:23,507-Speed 2986.42 samples/sec   Loss 2.3189   LearningRate 0.0032   Epoch: 16   Global Step: 203920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:02:26,939-Speed 2984.30 samples/sec   Loss 2.3044   LearningRate 0.0032   Epoch: 16   Global Step: 203930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:02:30,276-Speed 3069.74 samples/sec   Loss 2.2425   LearningRate 0.0032   Epoch: 16   Global Step: 203940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:02:33,709-Speed 2983.81 samples/sec   Loss 2.3223   LearningRate 0.0032   Epoch: 16   Global Step: 203950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:02:37,131-Speed 2992.45 samples/sec   Loss 2.2297   LearningRate 0.0032   Epoch: 16   Global Step: 203960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:02:40,591-Speed 2961.27 samples/sec   Loss 2.3127   LearningRate 0.0032   Epoch: 16   Global Step: 203970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:02:43,934-Speed 3063.85 samples/sec   Loss 2.2426   LearningRate 0.0032   Epoch: 16   Global Step: 203980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:02:47,340-Speed 3006.91 samples/sec   Loss 2.2573   LearningRate 0.0032   Epoch: 16   Global Step: 203990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:02:50,652-Speed 3092.89 samples/sec   Loss 2.2666   LearningRate 0.0032   Epoch: 16   Global Step: 204000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:02:53,991-Speed 3068.04 samples/sec   Loss 2.2604   LearningRate 0.0032   Epoch: 16   Global Step: 204010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:02:57,463-Speed 2949.63 samples/sec   Loss 2.2702   LearningRate 0.0032   Epoch: 16   Global Step: 204020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:03:00,918-Speed 2965.05 samples/sec   Loss 2.3414   LearningRate 0.0032   Epoch: 16   Global Step: 204030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:03:04,274-Speed 3052.26 samples/sec   Loss 2.2609   LearningRate 0.0032   Epoch: 16   Global Step: 204040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:03:07,652-Speed 3031.55 samples/sec   Loss 2.2938   LearningRate 0.0032   Epoch: 16   Global Step: 204050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:03:10,965-Speed 3092.01 samples/sec   Loss 2.2278   LearningRate 0.0032   Epoch: 16   Global Step: 204060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:03:14,325-Speed 3048.70 samples/sec   Loss 2.2701   LearningRate 0.0032   Epoch: 16   Global Step: 204070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:03:17,764-Speed 2977.76 samples/sec   Loss 2.2341   LearningRate 0.0032   Epoch: 16   Global Step: 204080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:03:21,160-Speed 3016.89 samples/sec   Loss 2.2784   LearningRate 0.0032   Epoch: 16   Global Step: 204090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:03:24,483-Speed 3081.65 samples/sec   Loss 2.2123   LearningRate 0.0032   Epoch: 16   Global Step: 204100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:03:27,862-Speed 3031.60 samples/sec   Loss 2.2594   LearningRate 0.0032   Epoch: 16   Global Step: 204110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:03:31,315-Speed 2966.62 samples/sec   Loss 2.2798   LearningRate 0.0032   Epoch: 16   Global Step: 204120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:03:34,684-Speed 3039.88 samples/sec   Loss 2.2701   LearningRate 0.0032   Epoch: 16   Global Step: 204130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:03:38,061-Speed 3033.07 samples/sec   Loss 2.2844   LearningRate 0.0032   Epoch: 16   Global Step: 204140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:03:41,408-Speed 3060.93 samples/sec   Loss 2.2299   LearningRate 0.0032   Epoch: 16   Global Step: 204150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:03:44,778-Speed 3038.80 samples/sec   Loss 2.2787   LearningRate 0.0032   Epoch: 16   Global Step: 204160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:03:48,138-Speed 3048.49 samples/sec   Loss 2.3064   LearningRate 0.0032   Epoch: 16   Global Step: 204170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:03:51,516-Speed 3032.26 samples/sec   Loss 2.2762   LearningRate 0.0032   Epoch: 16   Global Step: 204180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:03:54,895-Speed 3031.62 samples/sec   Loss 2.2756   LearningRate 0.0032   Epoch: 16   Global Step: 204190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:03:58,302-Speed 3006.48 samples/sec   Loss 2.2814   LearningRate 0.0032   Epoch: 16   Global Step: 204200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:04:01,707-Speed 3008.19 samples/sec   Loss 2.2828   LearningRate 0.0032   Epoch: 16   Global Step: 204210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:04:05,166-Speed 2961.48 samples/sec   Loss 2.2824   LearningRate 0.0032   Epoch: 16   Global Step: 204220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:04:08,567-Speed 3011.63 samples/sec   Loss 2.2362   LearningRate 0.0032   Epoch: 16   Global Step: 204230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:04:11,983-Speed 2998.04 samples/sec   Loss 2.2184   LearningRate 0.0032   Epoch: 16   Global Step: 204240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:04:15,391-Speed 3005.73 samples/sec   Loss 2.3477   LearningRate 0.0032   Epoch: 16   Global Step: 204250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:04:18,807-Speed 2998.64 samples/sec   Loss 2.2144   LearningRate 0.0032   Epoch: 16   Global Step: 204260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:04:22,176-Speed 3040.82 samples/sec   Loss 2.2475   LearningRate 0.0032   Epoch: 16   Global Step: 204270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:04:25,538-Speed 3046.44 samples/sec   Loss 2.2687   LearningRate 0.0032   Epoch: 16   Global Step: 204280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:04:28,929-Speed 3020.18 samples/sec   Loss 2.3220   LearningRate 0.0032   Epoch: 16   Global Step: 204290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:04:32,275-Speed 3061.97 samples/sec   Loss 2.2844   LearningRate 0.0032   Epoch: 16   Global Step: 204300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:04:35,651-Speed 3033.72 samples/sec   Loss 2.3029   LearningRate 0.0032   Epoch: 16   Global Step: 204310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:04:38,977-Speed 3080.48 samples/sec   Loss 2.2994   LearningRate 0.0032   Epoch: 16   Global Step: 204320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:04:42,379-Speed 3010.40 samples/sec   Loss 2.3596   LearningRate 0.0031   Epoch: 16   Global Step: 204330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:04:45,722-Speed 3063.69 samples/sec   Loss 2.3350   LearningRate 0.0031   Epoch: 16   Global Step: 204340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:04:49,082-Speed 3048.48 samples/sec   Loss 2.3192   LearningRate 0.0031   Epoch: 16   Global Step: 204350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:04:52,502-Speed 2995.61 samples/sec   Loss 2.2337   LearningRate 0.0031   Epoch: 16   Global Step: 204360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:04:55,915-Speed 3000.44 samples/sec   Loss 2.3048   LearningRate 0.0031   Epoch: 16   Global Step: 204370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:04:59,314-Speed 3014.07 samples/sec   Loss 2.2724   LearningRate 0.0031   Epoch: 16   Global Step: 204380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:05:02,724-Speed 3004.08 samples/sec   Loss 2.2794   LearningRate 0.0031   Epoch: 16   Global Step: 204390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:05:06,074-Speed 3057.50 samples/sec   Loss 2.2926   LearningRate 0.0031   Epoch: 16   Global Step: 204400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:05:09,425-Speed 3056.68 samples/sec   Loss 2.2045   LearningRate 0.0031   Epoch: 16   Global Step: 204410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:05:12,753-Speed 3077.74 samples/sec   Loss 2.3383   LearningRate 0.0031   Epoch: 16   Global Step: 204420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:05:16,096-Speed 3064.11 samples/sec   Loss 2.3298   LearningRate 0.0031   Epoch: 16   Global Step: 204430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:05:19,505-Speed 3003.73 samples/sec   Loss 2.3501   LearningRate 0.0031   Epoch: 16   Global Step: 204440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:05:22,910-Speed 3008.82 samples/sec   Loss 2.2762   LearningRate 0.0031   Epoch: 16   Global Step: 204450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:05:26,302-Speed 3019.75 samples/sec   Loss 2.2369   LearningRate 0.0031   Epoch: 16   Global Step: 204460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:05:29,655-Speed 3054.26 samples/sec   Loss 2.3066   LearningRate 0.0031   Epoch: 16   Global Step: 204470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:05:33,027-Speed 3038.04 samples/sec   Loss 2.2611   LearningRate 0.0031   Epoch: 16   Global Step: 204480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:05:36,402-Speed 3035.07 samples/sec   Loss 2.3251   LearningRate 0.0031   Epoch: 16   Global Step: 204490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:05:39,775-Speed 3036.32 samples/sec   Loss 2.3018   LearningRate 0.0031   Epoch: 16   Global Step: 204500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:05:43,288-Speed 2915.59 samples/sec   Loss 2.2885   LearningRate 0.0031   Epoch: 16   Global Step: 204510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:05:46,762-Speed 2948.57 samples/sec   Loss 2.3325   LearningRate 0.0031   Epoch: 16   Global Step: 204520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:05:50,203-Speed 2976.93 samples/sec   Loss 2.2792   LearningRate 0.0031   Epoch: 16   Global Step: 204530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:05:53,689-Speed 2938.10 samples/sec   Loss 2.2510   LearningRate 0.0031   Epoch: 16   Global Step: 204540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:05:57,106-Speed 2997.12 samples/sec   Loss 2.3028   LearningRate 0.0031   Epoch: 16   Global Step: 204550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:06:00,494-Speed 3024.26 samples/sec   Loss 2.3497   LearningRate 0.0031   Epoch: 16   Global Step: 204560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:06:03,797-Speed 3100.90 samples/sec   Loss 2.2914   LearningRate 0.0031   Epoch: 16   Global Step: 204570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:06:07,266-Speed 2952.12 samples/sec   Loss 2.2821   LearningRate 0.0031   Epoch: 16   Global Step: 204580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:06:10,647-Speed 3029.41 samples/sec   Loss 2.3262   LearningRate 0.0031   Epoch: 16   Global Step: 204590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:06:14,006-Speed 3049.31 samples/sec   Loss 2.3155   LearningRate 0.0031   Epoch: 16   Global Step: 204600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:06:17,425-Speed 2996.14 samples/sec   Loss 2.2819   LearningRate 0.0031   Epoch: 16   Global Step: 204610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:06:20,869-Speed 2973.78 samples/sec   Loss 2.2486   LearningRate 0.0031   Epoch: 16   Global Step: 204620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:06:24,327-Speed 2962.42 samples/sec   Loss 2.2321   LearningRate 0.0031   Epoch: 16   Global Step: 204630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:06:27,772-Speed 2973.05 samples/sec   Loss 2.2781   LearningRate 0.0031   Epoch: 16   Global Step: 204640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:06:31,135-Speed 3046.67 samples/sec   Loss 2.2400   LearningRate 0.0031   Epoch: 16   Global Step: 204650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:06:34,520-Speed 3025.16 samples/sec   Loss 2.2732   LearningRate 0.0031   Epoch: 16   Global Step: 204660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:06:37,904-Speed 3027.47 samples/sec   Loss 2.2914   LearningRate 0.0031   Epoch: 16   Global Step: 204670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:06:41,233-Speed 3077.34 samples/sec   Loss 2.3569   LearningRate 0.0031   Epoch: 16   Global Step: 204680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:06:44,553-Speed 3085.23 samples/sec   Loss 2.3455   LearningRate 0.0031   Epoch: 16   Global Step: 204690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:06:47,955-Speed 3010.63 samples/sec   Loss 2.3180   LearningRate 0.0031   Epoch: 16   Global Step: 204700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:06:51,334-Speed 3032.51 samples/sec   Loss 2.2557   LearningRate 0.0031   Epoch: 16   Global Step: 204710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:06:54,719-Speed 3026.32 samples/sec   Loss 2.2977   LearningRate 0.0031   Epoch: 16   Global Step: 204720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:06:58,101-Speed 3028.37 samples/sec   Loss 2.2847   LearningRate 0.0031   Epoch: 16   Global Step: 204730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:07:01,494-Speed 3018.87 samples/sec   Loss 2.2635   LearningRate 0.0031   Epoch: 16   Global Step: 204740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:07:04,816-Speed 3083.77 samples/sec   Loss 2.3125   LearningRate 0.0031   Epoch: 16   Global Step: 204750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:07:08,122-Speed 3098.12 samples/sec   Loss 2.2968   LearningRate 0.0031   Epoch: 16   Global Step: 204760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:07:11,534-Speed 3001.93 samples/sec   Loss 2.3120   LearningRate 0.0031   Epoch: 16   Global Step: 204770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:07:14,872-Speed 3068.71 samples/sec   Loss 2.3100   LearningRate 0.0031   Epoch: 16   Global Step: 204780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:07:18,206-Speed 3072.18 samples/sec   Loss 2.3547   LearningRate 0.0031   Epoch: 16   Global Step: 204790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:07:21,593-Speed 3024.41 samples/sec   Loss 2.2552   LearningRate 0.0031   Epoch: 16   Global Step: 204800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:07:24,935-Speed 3064.57 samples/sec   Loss 2.3043   LearningRate 0.0031   Epoch: 16   Global Step: 204810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:07:28,302-Speed 3041.72 samples/sec   Loss 2.3527   LearningRate 0.0031   Epoch: 16   Global Step: 204820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:07:31,744-Speed 2976.48 samples/sec   Loss 2.3371   LearningRate 0.0031   Epoch: 16   Global Step: 204830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:07:35,125-Speed 3029.16 samples/sec   Loss 2.2639   LearningRate 0.0031   Epoch: 16   Global Step: 204840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:07:38,489-Speed 3045.01 samples/sec   Loss 2.3521   LearningRate 0.0031   Epoch: 16   Global Step: 204850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:07:41,844-Speed 3052.66 samples/sec   Loss 2.2957   LearningRate 0.0031   Epoch: 16   Global Step: 204860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:07:45,207-Speed 3046.26 samples/sec   Loss 2.3084   LearningRate 0.0031   Epoch: 16   Global Step: 204870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:07:48,531-Speed 3081.24 samples/sec   Loss 2.2794   LearningRate 0.0031   Epoch: 16   Global Step: 204880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:07:51,870-Speed 3068.00 samples/sec   Loss 2.3227   LearningRate 0.0031   Epoch: 16   Global Step: 204890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:07:55,195-Speed 3080.32 samples/sec   Loss 2.2713   LearningRate 0.0031   Epoch: 16   Global Step: 204900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:07:58,554-Speed 3048.87 samples/sec   Loss 2.2274   LearningRate 0.0031   Epoch: 16   Global Step: 204910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:08:01,974-Speed 2995.76 samples/sec   Loss 2.2898   LearningRate 0.0031   Epoch: 16   Global Step: 204920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:08:05,340-Speed 3042.50 samples/sec   Loss 2.3949   LearningRate 0.0031   Epoch: 16   Global Step: 204930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:08:08,707-Speed 3042.23 samples/sec   Loss 2.3984   LearningRate 0.0031   Epoch: 16   Global Step: 204940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:08:12,115-Speed 3005.72 samples/sec   Loss 2.2617   LearningRate 0.0031   Epoch: 16   Global Step: 204950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:08:15,449-Speed 3071.45 samples/sec   Loss 2.2719   LearningRate 0.0031   Epoch: 16   Global Step: 204960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:08:18,792-Speed 3063.96 samples/sec   Loss 2.2518   LearningRate 0.0031   Epoch: 16   Global Step: 204970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:08:22,140-Speed 3059.52 samples/sec   Loss 2.3152   LearningRate 0.0031   Epoch: 16   Global Step: 204980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:08:25,478-Speed 3068.98 samples/sec   Loss 2.3621   LearningRate 0.0031   Epoch: 16   Global Step: 204990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:08:28,844-Speed 3043.22 samples/sec   Loss 2.3149   LearningRate 0.0031   Epoch: 16   Global Step: 205000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:08:32,330-Speed 2937.86 samples/sec   Loss 2.2313   LearningRate 0.0031   Epoch: 16   Global Step: 205010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:08:35,807-Speed 2945.81 samples/sec   Loss 2.3269   LearningRate 0.0031   Epoch: 16   Global Step: 205020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:08:39,131-Speed 3082.35 samples/sec   Loss 2.3236   LearningRate 0.0031   Epoch: 16   Global Step: 205030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:08:42,496-Speed 3043.59 samples/sec   Loss 2.3880   LearningRate 0.0030   Epoch: 16   Global Step: 205040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:08:45,883-Speed 3023.97 samples/sec   Loss 2.3251   LearningRate 0.0030   Epoch: 16   Global Step: 205050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:08:49,227-Speed 3063.07 samples/sec   Loss 2.2325   LearningRate 0.0030   Epoch: 16   Global Step: 205060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:08:52,542-Speed 3089.37 samples/sec   Loss 2.3964   LearningRate 0.0030   Epoch: 16   Global Step: 205070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:08:55,907-Speed 3044.81 samples/sec   Loss 2.2818   LearningRate 0.0030   Epoch: 16   Global Step: 205080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:08:59,299-Speed 3019.43 samples/sec   Loss 2.3080   LearningRate 0.0030   Epoch: 16   Global Step: 205090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:09:02,739-Speed 2977.55 samples/sec   Loss 2.3102   LearningRate 0.0030   Epoch: 16   Global Step: 205100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:09:06,089-Speed 3057.59 samples/sec   Loss 2.2987   LearningRate 0.0030   Epoch: 16   Global Step: 205110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:09:09,449-Speed 3048.49 samples/sec   Loss 2.2686   LearningRate 0.0030   Epoch: 16   Global Step: 205120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:09:12,847-Speed 3014.36 samples/sec   Loss 2.3017   LearningRate 0.0030   Epoch: 16   Global Step: 205130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:09:16,178-Speed 3075.66 samples/sec   Loss 2.3454   LearningRate 0.0030   Epoch: 16   Global Step: 205140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:09:19,581-Speed 3009.69 samples/sec   Loss 2.3790   LearningRate 0.0030   Epoch: 16   Global Step: 205150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:09:22,987-Speed 3006.95 samples/sec   Loss 2.3083   LearningRate 0.0030   Epoch: 16   Global Step: 205160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:09:26,396-Speed 3004.73 samples/sec   Loss 2.3152   LearningRate 0.0030   Epoch: 16   Global Step: 205170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:09:29,801-Speed 3008.41 samples/sec   Loss 2.1981   LearningRate 0.0030   Epoch: 16   Global Step: 205180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:09:33,123-Speed 3083.13 samples/sec   Loss 2.2894   LearningRate 0.0030   Epoch: 16   Global Step: 205190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:09:36,502-Speed 3031.34 samples/sec   Loss 2.2770   LearningRate 0.0030   Epoch: 16   Global Step: 205200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:09:39,886-Speed 3026.80 samples/sec   Loss 2.2184   LearningRate 0.0030   Epoch: 16   Global Step: 205210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:09:43,233-Speed 3060.30 samples/sec   Loss 2.2920   LearningRate 0.0030   Epoch: 16   Global Step: 205220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:09:46,633-Speed 3012.87 samples/sec   Loss 2.3292   LearningRate 0.0030   Epoch: 16   Global Step: 205230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:09:50,058-Speed 2990.38 samples/sec   Loss 2.2816   LearningRate 0.0030   Epoch: 16   Global Step: 205240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:09:53,409-Speed 3056.65 samples/sec   Loss 2.3002   LearningRate 0.0030   Epoch: 16   Global Step: 205250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:09:56,745-Speed 3069.77 samples/sec   Loss 2.2514   LearningRate 0.0030   Epoch: 16   Global Step: 205260   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:10:00,111-Speed 3043.64 samples/sec   Loss 2.3173   LearningRate 0.0030   Epoch: 16   Global Step: 205270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:10:03,483-Speed 3037.35 samples/sec   Loss 2.2985   LearningRate 0.0030   Epoch: 16   Global Step: 205280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:10:06,847-Speed 3045.19 samples/sec   Loss 2.2676   LearningRate 0.0030   Epoch: 16   Global Step: 205290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:10:10,159-Speed 3092.13 samples/sec   Loss 2.3528   LearningRate 0.0030   Epoch: 16   Global Step: 205300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:10:13,562-Speed 3010.29 samples/sec   Loss 2.3263   LearningRate 0.0030   Epoch: 16   Global Step: 205310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:10:17,047-Speed 2938.66 samples/sec   Loss 2.3086   LearningRate 0.0030   Epoch: 16   Global Step: 205320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:10:20,439-Speed 3019.63 samples/sec   Loss 2.2715   LearningRate 0.0030   Epoch: 16   Global Step: 205330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:10:23,795-Speed 3052.27 samples/sec   Loss 2.2243   LearningRate 0.0030   Epoch: 16   Global Step: 205340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:10:27,116-Speed 3084.32 samples/sec   Loss 2.3396   LearningRate 0.0030   Epoch: 16   Global Step: 205350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:10:30,503-Speed 3024.69 samples/sec   Loss 2.2578   LearningRate 0.0030   Epoch: 16   Global Step: 205360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:10:33,979-Speed 2946.28 samples/sec   Loss 2.3248   LearningRate 0.0030   Epoch: 16   Global Step: 205370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:10:37,337-Speed 3050.44 samples/sec   Loss 2.2768   LearningRate 0.0030   Epoch: 16   Global Step: 205380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:10:40,723-Speed 3025.15 samples/sec   Loss 2.2702   LearningRate 0.0030   Epoch: 16   Global Step: 205390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:10:44,053-Speed 3075.85 samples/sec   Loss 2.2932   LearningRate 0.0030   Epoch: 16   Global Step: 205400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:10:47,471-Speed 2996.85 samples/sec   Loss 2.3308   LearningRate 0.0030   Epoch: 16   Global Step: 205410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:10:50,855-Speed 3026.90 samples/sec   Loss 2.2919   LearningRate 0.0030   Epoch: 16   Global Step: 205420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:10:54,208-Speed 3054.84 samples/sec   Loss 2.3141   LearningRate 0.0030   Epoch: 16   Global Step: 205430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:10:57,574-Speed 3042.83 samples/sec   Loss 2.2542   LearningRate 0.0030   Epoch: 16   Global Step: 205440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:00,954-Speed 3030.85 samples/sec   Loss 2.2802   LearningRate 0.0030   Epoch: 16   Global Step: 205450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:04,370-Speed 2999.08 samples/sec   Loss 2.3227   LearningRate 0.0030   Epoch: 16   Global Step: 205460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:07,775-Speed 3007.81 samples/sec   Loss 2.3456   LearningRate 0.0030   Epoch: 16   Global Step: 205470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:11,113-Speed 3068.61 samples/sec   Loss 2.3141   LearningRate 0.0030   Epoch: 16   Global Step: 205480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:14,511-Speed 3014.41 samples/sec   Loss 2.2879   LearningRate 0.0030   Epoch: 16   Global Step: 205490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:17,870-Speed 3049.66 samples/sec   Loss 2.3141   LearningRate 0.0030   Epoch: 16   Global Step: 205500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:11:21,216-Speed 3061.34 samples/sec   Loss 2.3199   LearningRate 0.0030   Epoch: 16   Global Step: 205510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:24,604-Speed 3022.97 samples/sec   Loss 2.2957   LearningRate 0.0030   Epoch: 16   Global Step: 205520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:28,015-Speed 3003.07 samples/sec   Loss 2.2876   LearningRate 0.0030   Epoch: 16   Global Step: 205530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:31,423-Speed 3006.09 samples/sec   Loss 2.3089   LearningRate 0.0030   Epoch: 16   Global Step: 205540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:34,785-Speed 3046.22 samples/sec   Loss 2.2961   LearningRate 0.0030   Epoch: 16   Global Step: 205550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:38,122-Speed 3069.58 samples/sec   Loss 2.2715   LearningRate 0.0030   Epoch: 16   Global Step: 205560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:41,497-Speed 3034.80 samples/sec   Loss 2.3016   LearningRate 0.0030   Epoch: 16   Global Step: 205570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:44,832-Speed 3071.12 samples/sec   Loss 2.2974   LearningRate 0.0030   Epoch: 16   Global Step: 205580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:48,247-Speed 2999.70 samples/sec   Loss 2.2831   LearningRate 0.0030   Epoch: 16   Global Step: 205590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:51,705-Speed 2962.53 samples/sec   Loss 2.2880   LearningRate 0.0030   Epoch: 16   Global Step: 205600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:11:55,156-Speed 2967.58 samples/sec   Loss 2.3031   LearningRate 0.0030   Epoch: 16   Global Step: 205610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:11:58,607-Speed 2968.64 samples/sec   Loss 2.3024   LearningRate 0.0030   Epoch: 16   Global Step: 205620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:12:02,027-Speed 2995.06 samples/sec   Loss 2.3027   LearningRate 0.0030   Epoch: 16   Global Step: 205630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:12:05,461-Speed 2982.81 samples/sec   Loss 2.3577   LearningRate 0.0030   Epoch: 16   Global Step: 205640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:12:08,916-Speed 2963.97 samples/sec   Loss 2.2917   LearningRate 0.0030   Epoch: 16   Global Step: 205650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:12:12,273-Speed 3051.74 samples/sec   Loss 2.3139   LearningRate 0.0030   Epoch: 16   Global Step: 205660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:12:15,599-Speed 3079.56 samples/sec   Loss 2.2617   LearningRate 0.0030   Epoch: 16   Global Step: 205670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:12:18,927-Speed 3077.42 samples/sec   Loss 2.3129   LearningRate 0.0030   Epoch: 16   Global Step: 205680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:12:22,249-Speed 3083.65 samples/sec   Loss 2.3045   LearningRate 0.0030   Epoch: 16   Global Step: 205690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:12:25,639-Speed 3021.96 samples/sec   Loss 2.2591   LearningRate 0.0030   Epoch: 16   Global Step: 205700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:12:29,097-Speed 2961.46 samples/sec   Loss 2.3594   LearningRate 0.0030   Epoch: 16   Global Step: 205710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:12:32,534-Speed 2980.21 samples/sec   Loss 2.3334   LearningRate 0.0030   Epoch: 16   Global Step: 205720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:12:35,927-Speed 3019.56 samples/sec   Loss 2.2663   LearningRate 0.0030   Epoch: 16   Global Step: 205730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:12:39,359-Speed 2984.39 samples/sec   Loss 2.3243   LearningRate 0.0030   Epoch: 16   Global Step: 205740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:12:42,778-Speed 2996.21 samples/sec   Loss 2.2768   LearningRate 0.0030   Epoch: 16   Global Step: 205750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:12:46,175-Speed 3014.31 samples/sec   Loss 2.3197   LearningRate 0.0029   Epoch: 16   Global Step: 205760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:12:49,601-Speed 2990.54 samples/sec   Loss 2.3338   LearningRate 0.0029   Epoch: 16   Global Step: 205770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:12:52,994-Speed 3018.69 samples/sec   Loss 2.3330   LearningRate 0.0029   Epoch: 16   Global Step: 205780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:12:56,358-Speed 3045.22 samples/sec   Loss 2.2947   LearningRate 0.0029   Epoch: 16   Global Step: 205790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:12:59,744-Speed 3024.71 samples/sec   Loss 2.2566   LearningRate 0.0029   Epoch: 16   Global Step: 205800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:13:03,172-Speed 2987.87 samples/sec   Loss 2.2725   LearningRate 0.0029   Epoch: 16   Global Step: 205810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:13:06,562-Speed 3021.58 samples/sec   Loss 2.2715   LearningRate 0.0029   Epoch: 16   Global Step: 205820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:13:09,974-Speed 3002.68 samples/sec   Loss 2.2845   LearningRate 0.0029   Epoch: 16   Global Step: 205830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:13:13,363-Speed 3021.96 samples/sec   Loss 2.3351   LearningRate 0.0029   Epoch: 16   Global Step: 205840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:13:16,742-Speed 3031.02 samples/sec   Loss 2.3793   LearningRate 0.0029   Epoch: 16   Global Step: 205850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:13:20,065-Speed 3082.65 samples/sec   Loss 2.3038   LearningRate 0.0029   Epoch: 16   Global Step: 205860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:13:23,393-Speed 3078.26 samples/sec   Loss 2.3303   LearningRate 0.0029   Epoch: 16   Global Step: 205870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:13:26,746-Speed 3054.57 samples/sec   Loss 2.2989   LearningRate 0.0029   Epoch: 16   Global Step: 205880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:13:30,209-Speed 2957.53 samples/sec   Loss 2.2905   LearningRate 0.0029   Epoch: 16   Global Step: 205890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:13:33,566-Speed 3051.89 samples/sec   Loss 2.3342   LearningRate 0.0029   Epoch: 16   Global Step: 205900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:13:36,935-Speed 3040.16 samples/sec   Loss 2.2471   LearningRate 0.0029   Epoch: 16   Global Step: 205910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:13:40,352-Speed 2997.22 samples/sec   Loss 2.2541   LearningRate 0.0029   Epoch: 16   Global Step: 205920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:13:43,699-Speed 3061.02 samples/sec   Loss 2.2947   LearningRate 0.0029   Epoch: 16   Global Step: 205930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:13:47,101-Speed 3010.18 samples/sec   Loss 2.3462   LearningRate 0.0029   Epoch: 16   Global Step: 205940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:13:50,453-Speed 3055.91 samples/sec   Loss 2.3164   LearningRate 0.0029   Epoch: 16   Global Step: 205950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:13:53,831-Speed 3032.50 samples/sec   Loss 2.2932   LearningRate 0.0029   Epoch: 16   Global Step: 205960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:13:57,206-Speed 3035.15 samples/sec   Loss 2.3156   LearningRate 0.0029   Epoch: 16   Global Step: 205970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:14:00,606-Speed 3011.82 samples/sec   Loss 2.2408   LearningRate 0.0029   Epoch: 16   Global Step: 205980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:14:04,002-Speed 3016.92 samples/sec   Loss 2.3177   LearningRate 0.0029   Epoch: 16   Global Step: 205990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:14:07,389-Speed 3024.22 samples/sec   Loss 2.2550   LearningRate 0.0029   Epoch: 16   Global Step: 206000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:14:10,754-Speed 3043.54 samples/sec   Loss 2.3082   LearningRate 0.0029   Epoch: 16   Global Step: 206010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:14:14,131-Speed 3033.60 samples/sec   Loss 2.3075   LearningRate 0.0029   Epoch: 16   Global Step: 206020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:14:17,557-Speed 2989.18 samples/sec   Loss 2.3063   LearningRate 0.0029   Epoch: 16   Global Step: 206030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:14:20,981-Speed 2992.30 samples/sec   Loss 2.2990   LearningRate 0.0029   Epoch: 16   Global Step: 206040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:14:24,397-Speed 2997.72 samples/sec   Loss 2.3124   LearningRate 0.0029   Epoch: 16   Global Step: 206050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:14:27,885-Speed 2936.73 samples/sec   Loss 2.3967   LearningRate 0.0029   Epoch: 16   Global Step: 206060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:14:31,263-Speed 3032.60 samples/sec   Loss 2.2831   LearningRate 0.0029   Epoch: 16   Global Step: 206070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:14:34,679-Speed 2997.91 samples/sec   Loss 2.3249   LearningRate 0.0029   Epoch: 16   Global Step: 206080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:14:38,102-Speed 2992.87 samples/sec   Loss 2.3134   LearningRate 0.0029   Epoch: 16   Global Step: 206090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:14:41,532-Speed 2986.63 samples/sec   Loss 2.3232   LearningRate 0.0029   Epoch: 16   Global Step: 206100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:14:44,977-Speed 2972.73 samples/sec   Loss 2.3237   LearningRate 0.0029   Epoch: 16   Global Step: 206110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:14:48,322-Speed 3062.54 samples/sec   Loss 2.2957   LearningRate 0.0029   Epoch: 16   Global Step: 206120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:14:51,756-Speed 2982.23 samples/sec   Loss 2.2858   LearningRate 0.0029   Epoch: 16   Global Step: 206130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:14:55,167-Speed 3003.42 samples/sec   Loss 2.3679   LearningRate 0.0029   Epoch: 16   Global Step: 206140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:14:58,576-Speed 3005.02 samples/sec   Loss 2.2956   LearningRate 0.0029   Epoch: 16   Global Step: 206150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:15:01,981-Speed 3007.99 samples/sec   Loss 2.2548   LearningRate 0.0029   Epoch: 16   Global Step: 206160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:15:05,440-Speed 2960.74 samples/sec   Loss 2.2961   LearningRate 0.0029   Epoch: 16   Global Step: 206170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:15:08,829-Speed 3022.92 samples/sec   Loss 2.3926   LearningRate 0.0029   Epoch: 16   Global Step: 206180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:15:12,214-Speed 3025.70 samples/sec   Loss 2.2119   LearningRate 0.0029   Epoch: 16   Global Step: 206190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:15:15,591-Speed 3033.72 samples/sec   Loss 2.2999   LearningRate 0.0029   Epoch: 16   Global Step: 206200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:15:18,991-Speed 3012.24 samples/sec   Loss 2.3077   LearningRate 0.0029   Epoch: 16   Global Step: 206210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:15:22,376-Speed 3026.24 samples/sec   Loss 2.2677   LearningRate 0.0029   Epoch: 16   Global Step: 206220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:15:25,739-Speed 3045.46 samples/sec   Loss 2.2568   LearningRate 0.0029   Epoch: 16   Global Step: 206230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:15:29,182-Speed 2974.86 samples/sec   Loss 2.3158   LearningRate 0.0029   Epoch: 16   Global Step: 206240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:15:32,630-Speed 2970.50 samples/sec   Loss 2.3366   LearningRate 0.0029   Epoch: 16   Global Step: 206250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:15:36,037-Speed 3006.84 samples/sec   Loss 2.2952   LearningRate 0.0029   Epoch: 16   Global Step: 206260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:15:39,379-Speed 3064.76 samples/sec   Loss 2.2591   LearningRate 0.0029   Epoch: 16   Global Step: 206270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:15:42,829-Speed 2969.52 samples/sec   Loss 2.2886   LearningRate 0.0029   Epoch: 16   Global Step: 206280   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:15:46,246-Speed 2998.00 samples/sec   Loss 2.3265   LearningRate 0.0029   Epoch: 16   Global Step: 206290   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:15:49,706-Speed 2960.08 samples/sec   Loss 2.2955   LearningRate 0.0029   Epoch: 16   Global Step: 206300   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:15:53,169-Speed 2957.99 samples/sec   Loss 2.2893   LearningRate 0.0029   Epoch: 16   Global Step: 206310   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:15:56,545-Speed 3033.81 samples/sec   Loss 2.3019   LearningRate 0.0029   Epoch: 16   Global Step: 206320   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:15:59,912-Speed 3042.98 samples/sec   Loss 2.3374   LearningRate 0.0029   Epoch: 16   Global Step: 206330   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:16:03,273-Speed 3047.05 samples/sec   Loss 2.2277   LearningRate 0.0029   Epoch: 16   Global Step: 206340   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:16:06,633-Speed 3048.54 samples/sec   Loss 2.3030   LearningRate 0.0029   Epoch: 16   Global Step: 206350   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:16:10,019-Speed 3024.95 samples/sec   Loss 2.3414   LearningRate 0.0029   Epoch: 16   Global Step: 206360   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:16:13,485-Speed 2955.73 samples/sec   Loss 2.3294   LearningRate 0.0029   Epoch: 16   Global Step: 206370   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:16:16,912-Speed 2988.61 samples/sec   Loss 2.3304   LearningRate 0.0029   Epoch: 16   Global Step: 206380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:16:20,385-Speed 2948.91 samples/sec   Loss 2.3118   LearningRate 0.0029   Epoch: 16   Global Step: 206390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:16:23,764-Speed 3031.66 samples/sec   Loss 2.4101   LearningRate 0.0029   Epoch: 16   Global Step: 206400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:16:27,128-Speed 3044.88 samples/sec   Loss 2.3267   LearningRate 0.0029   Epoch: 16   Global Step: 206410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:16:30,514-Speed 3024.89 samples/sec   Loss 2.2868   LearningRate 0.0029   Epoch: 16   Global Step: 206420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:16:33,945-Speed 2986.01 samples/sec   Loss 2.2583   LearningRate 0.0029   Epoch: 16   Global Step: 206430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:16:37,284-Speed 3067.50 samples/sec   Loss 2.2985   LearningRate 0.0029   Epoch: 16   Global Step: 206440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:16:40,761-Speed 2945.39 samples/sec   Loss 2.3062   LearningRate 0.0029   Epoch: 16   Global Step: 206450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:16:44,160-Speed 3014.32 samples/sec   Loss 2.3448   LearningRate 0.0029   Epoch: 16   Global Step: 206460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:16:47,501-Speed 3065.92 samples/sec   Loss 2.3290   LearningRate 0.0029   Epoch: 16   Global Step: 206470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:16:50,935-Speed 2982.59 samples/sec   Loss 2.3193   LearningRate 0.0029   Epoch: 16   Global Step: 206480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:16:54,406-Speed 2951.09 samples/sec   Loss 2.2727   LearningRate 0.0028   Epoch: 16   Global Step: 206490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:16:57,857-Speed 2967.50 samples/sec   Loss 2.2476   LearningRate 0.0028   Epoch: 16   Global Step: 206500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:17:01,249-Speed 3019.54 samples/sec   Loss 2.3112   LearningRate 0.0028   Epoch: 16   Global Step: 206510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:17:04,673-Speed 2992.19 samples/sec   Loss 2.2911   LearningRate 0.0028   Epoch: 16   Global Step: 206520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:17:08,120-Speed 2971.18 samples/sec   Loss 2.2962   LearningRate 0.0028   Epoch: 16   Global Step: 206530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:17:11,501-Speed 3029.31 samples/sec   Loss 2.2715   LearningRate 0.0028   Epoch: 16   Global Step: 206540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:17:14,848-Speed 3061.08 samples/sec   Loss 2.3246   LearningRate 0.0028   Epoch: 16   Global Step: 206550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:17:18,273-Speed 2990.50 samples/sec   Loss 2.3439   LearningRate 0.0028   Epoch: 16   Global Step: 206560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:17:21,686-Speed 3000.75 samples/sec   Loss 2.2681   LearningRate 0.0028   Epoch: 16   Global Step: 206570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:17:25,059-Speed 3037.42 samples/sec   Loss 2.3731   LearningRate 0.0028   Epoch: 16   Global Step: 206580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:17:28,487-Speed 2988.05 samples/sec   Loss 2.3381   LearningRate 0.0028   Epoch: 16   Global Step: 206590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:17:31,851-Speed 3044.52 samples/sec   Loss 2.3020   LearningRate 0.0028   Epoch: 16   Global Step: 206600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:17:35,274-Speed 2992.85 samples/sec   Loss 2.3039   LearningRate 0.0028   Epoch: 16   Global Step: 206610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:17:38,699-Speed 2990.87 samples/sec   Loss 2.2568   LearningRate 0.0028   Epoch: 16   Global Step: 206620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:17:42,032-Speed 3073.15 samples/sec   Loss 2.3309   LearningRate 0.0028   Epoch: 16   Global Step: 206630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:17:46,208-Speed 2452.59 samples/sec   Loss 2.3457   LearningRate 0.0028   Epoch: 16   Global Step: 206640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:17:49,534-Speed 3080.25 samples/sec   Loss 2.3143   LearningRate 0.0028   Epoch: 16   Global Step: 206650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:17:52,910-Speed 3033.63 samples/sec   Loss 2.3022   LearningRate 0.0028   Epoch: 16   Global Step: 206660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:17:56,257-Speed 3060.10 samples/sec   Loss 2.3306   LearningRate 0.0028   Epoch: 16   Global Step: 206670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:17:59,703-Speed 2972.67 samples/sec   Loss 2.3143   LearningRate 0.0028   Epoch: 16   Global Step: 206680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:18:03,067-Speed 3044.96 samples/sec   Loss 2.3317   LearningRate 0.0028   Epoch: 16   Global Step: 206690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:18:06,435-Speed 3041.11 samples/sec   Loss 2.3780   LearningRate 0.0028   Epoch: 16   Global Step: 206700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:18:09,842-Speed 3006.77 samples/sec   Loss 2.3168   LearningRate 0.0028   Epoch: 16   Global Step: 206710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:18:13,216-Speed 3036.18 samples/sec   Loss 2.2683   LearningRate 0.0028   Epoch: 16   Global Step: 206720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:18:16,616-Speed 3012.63 samples/sec   Loss 2.3508   LearningRate 0.0028   Epoch: 16   Global Step: 206730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:18:20,052-Speed 2981.24 samples/sec   Loss 2.3231   LearningRate 0.0028   Epoch: 16   Global Step: 206740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:18:23,429-Speed 3033.14 samples/sec   Loss 2.2960   LearningRate 0.0028   Epoch: 16   Global Step: 206750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:18:26,791-Speed 3046.76 samples/sec   Loss 2.3245   LearningRate 0.0028   Epoch: 16   Global Step: 206760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:18:30,135-Speed 3062.87 samples/sec   Loss 2.3238   LearningRate 0.0028   Epoch: 16   Global Step: 206770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:18:33,543-Speed 3006.19 samples/sec   Loss 2.3271   LearningRate 0.0028   Epoch: 16   Global Step: 206780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:18:36,896-Speed 3054.07 samples/sec   Loss 2.2880   LearningRate 0.0028   Epoch: 16   Global Step: 206790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:18:40,324-Speed 2988.13 samples/sec   Loss 2.3424   LearningRate 0.0028   Epoch: 16   Global Step: 206800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:18:43,706-Speed 3029.59 samples/sec   Loss 2.3505   LearningRate 0.0028   Epoch: 16   Global Step: 206810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:18:47,084-Speed 3031.58 samples/sec   Loss 2.3439   LearningRate 0.0028   Epoch: 16   Global Step: 206820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:18:50,460-Speed 3034.51 samples/sec   Loss 2.3378   LearningRate 0.0028   Epoch: 16   Global Step: 206830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:18:53,837-Speed 3032.85 samples/sec   Loss 2.3389   LearningRate 0.0028   Epoch: 16   Global Step: 206840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:18:57,153-Speed 3088.32 samples/sec   Loss 2.2872   LearningRate 0.0028   Epoch: 16   Global Step: 206850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:19:00,600-Speed 2971.62 samples/sec   Loss 2.2674   LearningRate 0.0028   Epoch: 16   Global Step: 206860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:19:03,981-Speed 3029.71 samples/sec   Loss 2.3204   LearningRate 0.0028   Epoch: 16   Global Step: 206870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:19:07,371-Speed 3022.04 samples/sec   Loss 2.3045   LearningRate 0.0028   Epoch: 16   Global Step: 206880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:19:10,748-Speed 3032.92 samples/sec   Loss 2.3450   LearningRate 0.0028   Epoch: 16   Global Step: 206890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:19:14,172-Speed 2991.24 samples/sec   Loss 2.2872   LearningRate 0.0028   Epoch: 16   Global Step: 206900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:19:17,579-Speed 3006.08 samples/sec   Loss 2.3689   LearningRate 0.0028   Epoch: 16   Global Step: 206910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:19:20,929-Speed 3058.34 samples/sec   Loss 2.3007   LearningRate 0.0028   Epoch: 16   Global Step: 206920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:19:24,345-Speed 2997.84 samples/sec   Loss 2.2919   LearningRate 0.0028   Epoch: 16   Global Step: 206930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:19:27,730-Speed 3026.12 samples/sec   Loss 2.2934   LearningRate 0.0028   Epoch: 16   Global Step: 206940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:19:31,209-Speed 2944.53 samples/sec   Loss 2.3484   LearningRate 0.0028   Epoch: 16   Global Step: 206950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:19:34,613-Speed 3008.79 samples/sec   Loss 2.2716   LearningRate 0.0028   Epoch: 16   Global Step: 206960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:19:37,966-Speed 3055.48 samples/sec   Loss 2.2835   LearningRate 0.0028   Epoch: 16   Global Step: 206970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:19:41,346-Speed 3030.03 samples/sec   Loss 2.2505   LearningRate 0.0028   Epoch: 16   Global Step: 206980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:19:44,750-Speed 3009.45 samples/sec   Loss 2.3068   LearningRate 0.0028   Epoch: 16   Global Step: 206990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:19:48,134-Speed 3027.53 samples/sec   Loss 2.2920   LearningRate 0.0028   Epoch: 16   Global Step: 207000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:19:51,473-Speed 3067.56 samples/sec   Loss 2.3350   LearningRate 0.0028   Epoch: 16   Global Step: 207010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:19:54,875-Speed 3010.73 samples/sec   Loss 2.2726   LearningRate 0.0028   Epoch: 16   Global Step: 207020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:19:58,251-Speed 3034.22 samples/sec   Loss 2.3593   LearningRate 0.0028   Epoch: 16   Global Step: 207030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:20:01,651-Speed 3011.84 samples/sec   Loss 2.3014   LearningRate 0.0028   Epoch: 16   Global Step: 207040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:20:05,052-Speed 3011.57 samples/sec   Loss 2.2824   LearningRate 0.0028   Epoch: 16   Global Step: 207050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:20:08,511-Speed 2961.63 samples/sec   Loss 2.3685   LearningRate 0.0028   Epoch: 16   Global Step: 207060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:20:11,928-Speed 2997.37 samples/sec   Loss 2.2685   LearningRate 0.0028   Epoch: 16   Global Step: 207070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:20:15,231-Speed 3101.43 samples/sec   Loss 2.2611   LearningRate 0.0028   Epoch: 16   Global Step: 207080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:20:18,676-Speed 2973.13 samples/sec   Loss 2.2926   LearningRate 0.0028   Epoch: 16   Global Step: 207090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:20:22,044-Speed 3041.25 samples/sec   Loss 2.3799   LearningRate 0.0028   Epoch: 16   Global Step: 207100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:20:25,474-Speed 2986.35 samples/sec   Loss 2.2781   LearningRate 0.0028   Epoch: 16   Global Step: 207110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:20:28,859-Speed 3025.97 samples/sec   Loss 2.3407   LearningRate 0.0028   Epoch: 16   Global Step: 207120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:20:32,237-Speed 3032.85 samples/sec   Loss 2.3408   LearningRate 0.0028   Epoch: 16   Global Step: 207130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:20:35,574-Speed 3068.73 samples/sec   Loss 2.3841   LearningRate 0.0028   Epoch: 16   Global Step: 207140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:20:38,937-Speed 3046.49 samples/sec   Loss 2.3407   LearningRate 0.0028   Epoch: 16   Global Step: 207150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:20:42,300-Speed 3045.59 samples/sec   Loss 2.2239   LearningRate 0.0028   Epoch: 16   Global Step: 207160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:20:45,697-Speed 3015.30 samples/sec   Loss 2.2415   LearningRate 0.0028   Epoch: 16   Global Step: 207170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:20:49,074-Speed 3033.17 samples/sec   Loss 2.2912   LearningRate 0.0028   Epoch: 16   Global Step: 207180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:20:52,496-Speed 2992.62 samples/sec   Loss 2.2690   LearningRate 0.0028   Epoch: 16   Global Step: 207190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:20:55,850-Speed 3053.95 samples/sec   Loss 2.2918   LearningRate 0.0028   Epoch: 16   Global Step: 207200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:20:59,285-Speed 2982.48 samples/sec   Loss 2.2675   LearningRate 0.0028   Epoch: 16   Global Step: 207210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:02,766-Speed 2942.18 samples/sec   Loss 2.3359   LearningRate 0.0028   Epoch: 16   Global Step: 207220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:06,195-Speed 2987.19 samples/sec   Loss 2.3546   LearningRate 0.0027   Epoch: 16   Global Step: 207230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:09,589-Speed 3018.08 samples/sec   Loss 2.2874   LearningRate 0.0027   Epoch: 16   Global Step: 207240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:12,996-Speed 3006.69 samples/sec   Loss 2.3016   LearningRate 0.0027   Epoch: 16   Global Step: 207250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:16,321-Speed 3080.26 samples/sec   Loss 2.2859   LearningRate 0.0027   Epoch: 16   Global Step: 207260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:19,764-Speed 2974.92 samples/sec   Loss 2.2840   LearningRate 0.0027   Epoch: 16   Global Step: 207270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:23,115-Speed 3056.52 samples/sec   Loss 2.2172   LearningRate 0.0027   Epoch: 16   Global Step: 207280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:21:26,476-Speed 3048.50 samples/sec   Loss 2.2788   LearningRate 0.0027   Epoch: 16   Global Step: 207290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:21:29,902-Speed 2989.23 samples/sec   Loss 2.3227   LearningRate 0.0027   Epoch: 16   Global Step: 207300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:33,315-Speed 3000.65 samples/sec   Loss 2.2877   LearningRate 0.0027   Epoch: 16   Global Step: 207310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:36,729-Speed 3000.49 samples/sec   Loss 2.2705   LearningRate 0.0027   Epoch: 16   Global Step: 207320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:40,191-Speed 2958.63 samples/sec   Loss 2.3333   LearningRate 0.0027   Epoch: 16   Global Step: 207330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:43,615-Speed 2992.19 samples/sec   Loss 2.2936   LearningRate 0.0027   Epoch: 16   Global Step: 207340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:46,972-Speed 3050.65 samples/sec   Loss 2.3471   LearningRate 0.0027   Epoch: 16   Global Step: 207350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:50,384-Speed 3002.43 samples/sec   Loss 2.2394   LearningRate 0.0027   Epoch: 16   Global Step: 207360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:53,794-Speed 3003.92 samples/sec   Loss 2.3378   LearningRate 0.0027   Epoch: 16   Global Step: 207370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:21:57,195-Speed 3011.61 samples/sec   Loss 2.3013   LearningRate 0.0027   Epoch: 16   Global Step: 207380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:22:00,567-Speed 3037.59 samples/sec   Loss 2.3221   LearningRate 0.0027   Epoch: 16   Global Step: 207390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:22:03,887-Speed 3084.65 samples/sec   Loss 2.2942   LearningRate 0.0027   Epoch: 16   Global Step: 207400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:22:07,244-Speed 3051.15 samples/sec   Loss 2.3016   LearningRate 0.0027   Epoch: 16   Global Step: 207410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:22:10,612-Speed 3041.40 samples/sec   Loss 2.3344   LearningRate 0.0027   Epoch: 16   Global Step: 207420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:22:13,974-Speed 3047.10 samples/sec   Loss 2.2542   LearningRate 0.0027   Epoch: 16   Global Step: 207430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:22:17,424-Speed 2969.31 samples/sec   Loss 2.2739   LearningRate 0.0027   Epoch: 16   Global Step: 207440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:22:20,768-Speed 3063.16 samples/sec   Loss 2.3243   LearningRate 0.0027   Epoch: 16   Global Step: 207450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:22:24,175-Speed 3006.90 samples/sec   Loss 2.3513   LearningRate 0.0027   Epoch: 16   Global Step: 207460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:22:27,515-Speed 3066.75 samples/sec   Loss 2.3071   LearningRate 0.0027   Epoch: 16   Global Step: 207470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:22:30,881-Speed 3043.50 samples/sec   Loss 2.2576   LearningRate 0.0027   Epoch: 16   Global Step: 207480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:22:34,252-Speed 3038.27 samples/sec   Loss 2.2408   LearningRate 0.0027   Epoch: 16   Global Step: 207490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:22:37,587-Speed 3071.26 samples/sec   Loss 2.3379   LearningRate 0.0027   Epoch: 16   Global Step: 207500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:22:40,899-Speed 3092.65 samples/sec   Loss 2.3035   LearningRate 0.0027   Epoch: 16   Global Step: 207510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:22:44,221-Speed 3082.80 samples/sec   Loss 2.2449   LearningRate 0.0027   Epoch: 16   Global Step: 207520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:22:47,633-Speed 3002.70 samples/sec   Loss 2.2797   LearningRate 0.0027   Epoch: 16   Global Step: 207530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:22:51,028-Speed 3016.52 samples/sec   Loss 2.3031   LearningRate 0.0027   Epoch: 16   Global Step: 207540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:22:54,447-Speed 2996.12 samples/sec   Loss 2.2769   LearningRate 0.0027   Epoch: 16   Global Step: 207550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:22:57,760-Speed 3091.97 samples/sec   Loss 2.4068   LearningRate 0.0027   Epoch: 16   Global Step: 207560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:23:01,122-Speed 3046.29 samples/sec   Loss 2.2436   LearningRate 0.0027   Epoch: 16   Global Step: 207570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:23:04,462-Speed 3066.80 samples/sec   Loss 2.3079   LearningRate 0.0027   Epoch: 16   Global Step: 207580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:23:07,775-Speed 3091.70 samples/sec   Loss 2.3331   LearningRate 0.0027   Epoch: 16   Global Step: 207590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:23:11,123-Speed 3059.63 samples/sec   Loss 2.3228   LearningRate 0.0027   Epoch: 16   Global Step: 207600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:23:14,556-Speed 2984.32 samples/sec   Loss 2.2569   LearningRate 0.0027   Epoch: 16   Global Step: 207610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:23:17,928-Speed 3036.96 samples/sec   Loss 2.2991   LearningRate 0.0027   Epoch: 16   Global Step: 207620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:23:21,310-Speed 3029.54 samples/sec   Loss 2.3377   LearningRate 0.0027   Epoch: 16   Global Step: 207630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:23:24,696-Speed 3024.81 samples/sec   Loss 2.3344   LearningRate 0.0027   Epoch: 16   Global Step: 207640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:23:28,087-Speed 3020.90 samples/sec   Loss 2.3183   LearningRate 0.0027   Epoch: 16   Global Step: 207650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:23:31,437-Speed 3057.61 samples/sec   Loss 2.3175   LearningRate 0.0027   Epoch: 16   Global Step: 207660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:23:34,866-Speed 2986.95 samples/sec   Loss 2.2651   LearningRate 0.0027   Epoch: 16   Global Step: 207670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:23:38,233-Speed 3042.96 samples/sec   Loss 2.3655   LearningRate 0.0027   Epoch: 16   Global Step: 207680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:23:41,694-Speed 2959.16 samples/sec   Loss 2.2898   LearningRate 0.0027   Epoch: 16   Global Step: 207690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:23:45,110-Speed 2998.21 samples/sec   Loss 2.3169   LearningRate 0.0027   Epoch: 16   Global Step: 207700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:23:48,537-Speed 2988.86 samples/sec   Loss 2.3471   LearningRate 0.0027   Epoch: 16   Global Step: 207710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:23:51,894-Speed 3051.92 samples/sec   Loss 2.2556   LearningRate 0.0027   Epoch: 16   Global Step: 207720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:23:55,248-Speed 3053.82 samples/sec   Loss 2.2586   LearningRate 0.0027   Epoch: 16   Global Step: 207730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:23:58,706-Speed 2962.35 samples/sec   Loss 2.3405   LearningRate 0.0027   Epoch: 16   Global Step: 207740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:24:02,113-Speed 3006.34 samples/sec   Loss 2.2153   LearningRate 0.0027   Epoch: 16   Global Step: 207750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:24:05,513-Speed 3012.45 samples/sec   Loss 2.2778   LearningRate 0.0027   Epoch: 16   Global Step: 207760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:24:08,903-Speed 3021.79 samples/sec   Loss 2.3161   LearningRate 0.0027   Epoch: 16   Global Step: 207770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:24:12,284-Speed 3029.33 samples/sec   Loss 2.3665   LearningRate 0.0027   Epoch: 16   Global Step: 207780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:24:15,736-Speed 2967.30 samples/sec   Loss 2.3555   LearningRate 0.0027   Epoch: 16   Global Step: 207790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:24:19,230-Speed 2931.58 samples/sec   Loss 2.2792   LearningRate 0.0027   Epoch: 16   Global Step: 207800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:24:22,682-Speed 2967.11 samples/sec   Loss 2.3014   LearningRate 0.0027   Epoch: 16   Global Step: 207810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:24:26,141-Speed 2961.35 samples/sec   Loss 2.3326   LearningRate 0.0027   Epoch: 16   Global Step: 207820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:24:29,495-Speed 3054.57 samples/sec   Loss 2.3316   LearningRate 0.0027   Epoch: 16   Global Step: 207830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:24:32,865-Speed 3039.40 samples/sec   Loss 2.3075   LearningRate 0.0027   Epoch: 16   Global Step: 207840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:24:36,267-Speed 3010.70 samples/sec   Loss 2.2774   LearningRate 0.0027   Epoch: 16   Global Step: 207850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:24:39,662-Speed 3017.22 samples/sec   Loss 2.3099   LearningRate 0.0027   Epoch: 16   Global Step: 207860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:24:43,022-Speed 3048.01 samples/sec   Loss 2.2866   LearningRate 0.0027   Epoch: 16   Global Step: 207870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:24:46,385-Speed 3045.79 samples/sec   Loss 2.2791   LearningRate 0.0027   Epoch: 16   Global Step: 207880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:24:49,775-Speed 3021.89 samples/sec   Loss 2.3108   LearningRate 0.0027   Epoch: 16   Global Step: 207890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:24:53,124-Speed 3058.40 samples/sec   Loss 2.3286   LearningRate 0.0027   Epoch: 16   Global Step: 207900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:24:56,490-Speed 3043.15 samples/sec   Loss 2.3092   LearningRate 0.0027   Epoch: 16   Global Step: 207910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:24:59,831-Speed 3065.63 samples/sec   Loss 2.2320   LearningRate 0.0027   Epoch: 16   Global Step: 207920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:03,308-Speed 2945.70 samples/sec   Loss 2.3993   LearningRate 0.0027   Epoch: 16   Global Step: 207930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:06,682-Speed 3035.27 samples/sec   Loss 2.2760   LearningRate 0.0027   Epoch: 16   Global Step: 207940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:10,157-Speed 2947.66 samples/sec   Loss 2.3024   LearningRate 0.0027   Epoch: 16   Global Step: 207950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:13,640-Speed 2941.38 samples/sec   Loss 2.3068   LearningRate 0.0027   Epoch: 16   Global Step: 207960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:17,048-Speed 3005.33 samples/sec   Loss 2.3940   LearningRate 0.0027   Epoch: 16   Global Step: 207970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:20,493-Speed 2973.77 samples/sec   Loss 2.3324   LearningRate 0.0027   Epoch: 16   Global Step: 207980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:23,925-Speed 2984.53 samples/sec   Loss 2.2771   LearningRate 0.0026   Epoch: 16   Global Step: 207990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:27,244-Speed 3085.70 samples/sec   Loss 2.2696   LearningRate 0.0026   Epoch: 16   Global Step: 208000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:30,656-Speed 3002.15 samples/sec   Loss 2.3118   LearningRate 0.0026   Epoch: 16   Global Step: 208010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:34,109-Speed 2966.76 samples/sec   Loss 2.3321   LearningRate 0.0026   Epoch: 16   Global Step: 208020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:37,433-Speed 3081.06 samples/sec   Loss 2.3571   LearningRate 0.0026   Epoch: 16   Global Step: 208030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:40,783-Speed 3057.25 samples/sec   Loss 2.3606   LearningRate 0.0026   Epoch: 16   Global Step: 208040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:25:44,162-Speed 3031.54 samples/sec   Loss 2.3448   LearningRate 0.0026   Epoch: 16   Global Step: 208050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:25:47,475-Speed 3091.52 samples/sec   Loss 2.2806   LearningRate 0.0026   Epoch: 16   Global Step: 208060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:50,828-Speed 3055.24 samples/sec   Loss 2.3157   LearningRate 0.0026   Epoch: 16   Global Step: 208070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:54,244-Speed 2998.63 samples/sec   Loss 2.2817   LearningRate 0.0026   Epoch: 16   Global Step: 208080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:25:57,596-Speed 3055.02 samples/sec   Loss 2.2801   LearningRate 0.0026   Epoch: 16   Global Step: 208090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:26:01,081-Speed 2939.74 samples/sec   Loss 2.2013   LearningRate 0.0026   Epoch: 16   Global Step: 208100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:26:04,534-Speed 2966.36 samples/sec   Loss 2.2889   LearningRate 0.0026   Epoch: 16   Global Step: 208110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:26:07,888-Speed 3053.42 samples/sec   Loss 2.2898   LearningRate 0.0026   Epoch: 16   Global Step: 208120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:26:11,252-Speed 3045.30 samples/sec   Loss 2.3392   LearningRate 0.0026   Epoch: 16   Global Step: 208130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:26:14,671-Speed 2994.90 samples/sec   Loss 2.3089   LearningRate 0.0026   Epoch: 16   Global Step: 208140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:26:18,099-Speed 2988.94 samples/sec   Loss 2.2283   LearningRate 0.0026   Epoch: 16   Global Step: 208150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:26:21,436-Speed 3069.30 samples/sec   Loss 2.2973   LearningRate 0.0026   Epoch: 16   Global Step: 208160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:26:24,825-Speed 3025.35 samples/sec   Loss 2.3423   LearningRate 0.0026   Epoch: 16   Global Step: 208170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:26:28,256-Speed 2985.72 samples/sec   Loss 2.3168   LearningRate 0.0026   Epoch: 16   Global Step: 208180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:26:31,702-Speed 2972.84 samples/sec   Loss 2.2863   LearningRate 0.0026   Epoch: 16   Global Step: 208190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:26:35,182-Speed 2943.39 samples/sec   Loss 2.3400   LearningRate 0.0026   Epoch: 16   Global Step: 208200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:26:38,613-Speed 2985.36 samples/sec   Loss 2.3492   LearningRate 0.0026   Epoch: 16   Global Step: 208210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:26:41,966-Speed 3054.11 samples/sec   Loss 2.2876   LearningRate 0.0026   Epoch: 16   Global Step: 208220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:26:45,396-Speed 2986.55 samples/sec   Loss 2.3584   LearningRate 0.0026   Epoch: 16   Global Step: 208230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:26:48,714-Speed 3087.36 samples/sec   Loss 2.2769   LearningRate 0.0026   Epoch: 16   Global Step: 208240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:26:52,071-Speed 3050.76 samples/sec   Loss 2.2947   LearningRate 0.0026   Epoch: 16   Global Step: 208250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:26:55,459-Speed 3023.30 samples/sec   Loss 2.3188   LearningRate 0.0026   Epoch: 16   Global Step: 208260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:26:58,904-Speed 2974.03 samples/sec   Loss 2.3297   LearningRate 0.0026   Epoch: 16   Global Step: 208270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:02,261-Speed 3050.47 samples/sec   Loss 2.2652   LearningRate 0.0026   Epoch: 16   Global Step: 208280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:05,643-Speed 3028.66 samples/sec   Loss 2.3215   LearningRate 0.0026   Epoch: 16   Global Step: 208290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:09,034-Speed 3021.20 samples/sec   Loss 2.3725   LearningRate 0.0026   Epoch: 16   Global Step: 208300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:12,394-Speed 3048.81 samples/sec   Loss 2.2366   LearningRate 0.0026   Epoch: 16   Global Step: 208310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:15,824-Speed 2985.69 samples/sec   Loss 2.3519   LearningRate 0.0026   Epoch: 16   Global Step: 208320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:19,234-Speed 3004.38 samples/sec   Loss 2.3271   LearningRate 0.0026   Epoch: 16   Global Step: 208330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:22,580-Speed 3061.59 samples/sec   Loss 2.3150   LearningRate 0.0026   Epoch: 16   Global Step: 208340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:25,988-Speed 3005.05 samples/sec   Loss 2.2191   LearningRate 0.0026   Epoch: 16   Global Step: 208350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:29,386-Speed 3014.39 samples/sec   Loss 2.3170   LearningRate 0.0026   Epoch: 16   Global Step: 208360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:27:32,742-Speed 3052.35 samples/sec   Loss 2.2791   LearningRate 0.0026   Epoch: 16   Global Step: 208370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:36,110-Speed 3041.08 samples/sec   Loss 2.3192   LearningRate 0.0026   Epoch: 16   Global Step: 208380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:39,524-Speed 3000.08 samples/sec   Loss 2.3741   LearningRate 0.0026   Epoch: 16   Global Step: 208390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:42,959-Speed 2982.25 samples/sec   Loss 2.3000   LearningRate 0.0026   Epoch: 16   Global Step: 208400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:46,373-Speed 3000.07 samples/sec   Loss 2.2494   LearningRate 0.0026   Epoch: 16   Global Step: 208410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:49,859-Speed 2937.62 samples/sec   Loss 2.2843   LearningRate 0.0026   Epoch: 16   Global Step: 208420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:53,244-Speed 3026.42 samples/sec   Loss 2.3674   LearningRate 0.0026   Epoch: 16   Global Step: 208430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:56,546-Speed 3101.53 samples/sec   Loss 2.3110   LearningRate 0.0026   Epoch: 16   Global Step: 208440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:27:59,933-Speed 3025.12 samples/sec   Loss 2.2716   LearningRate 0.0026   Epoch: 16   Global Step: 208450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:28:03,298-Speed 3043.46 samples/sec   Loss 2.3042   LearningRate 0.0026   Epoch: 16   Global Step: 208460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:28:06,751-Speed 2966.67 samples/sec   Loss 2.2706   LearningRate 0.0026   Epoch: 16   Global Step: 208470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:28:10,137-Speed 3025.04 samples/sec   Loss 2.3086   LearningRate 0.0026   Epoch: 16   Global Step: 208480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:28:13,566-Speed 2986.59 samples/sec   Loss 2.3512   LearningRate 0.0026   Epoch: 16   Global Step: 208490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:28:16,897-Speed 3074.88 samples/sec   Loss 2.2546   LearningRate 0.0026   Epoch: 16   Global Step: 208500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:28:20,310-Speed 3001.26 samples/sec   Loss 2.3205   LearningRate 0.0026   Epoch: 16   Global Step: 208510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:28:23,797-Speed 2937.15 samples/sec   Loss 2.2874   LearningRate 0.0026   Epoch: 16   Global Step: 208520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:28:27,208-Speed 3003.64 samples/sec   Loss 2.3965   LearningRate 0.0026   Epoch: 16   Global Step: 208530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:28:30,589-Speed 3029.68 samples/sec   Loss 2.4210   LearningRate 0.0026   Epoch: 16   Global Step: 208540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:28:34,097-Speed 2919.85 samples/sec   Loss 2.3841   LearningRate 0.0026   Epoch: 16   Global Step: 208550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:28:37,531-Speed 2982.39 samples/sec   Loss 2.3093   LearningRate 0.0026   Epoch: 16   Global Step: 208560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:28:40,967-Speed 2981.14 samples/sec   Loss 2.3320   LearningRate 0.0026   Epoch: 16   Global Step: 208570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:28:44,292-Speed 3080.80 samples/sec   Loss 2.3497   LearningRate 0.0026   Epoch: 16   Global Step: 208580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:28:47,608-Speed 3089.29 samples/sec   Loss 2.2917   LearningRate 0.0026   Epoch: 16   Global Step: 208590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:28:51,027-Speed 2996.06 samples/sec   Loss 2.2606   LearningRate 0.0026   Epoch: 16   Global Step: 208600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:28:54,418-Speed 3020.62 samples/sec   Loss 2.2194   LearningRate 0.0026   Epoch: 16   Global Step: 208610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:28:57,875-Speed 2962.84 samples/sec   Loss 2.2555   LearningRate 0.0026   Epoch: 16   Global Step: 208620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:01,269-Speed 3018.44 samples/sec   Loss 2.3894   LearningRate 0.0026   Epoch: 16   Global Step: 208630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:04,636-Speed 3042.32 samples/sec   Loss 2.2807   LearningRate 0.0026   Epoch: 16   Global Step: 208640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:08,012-Speed 3033.73 samples/sec   Loss 2.2742   LearningRate 0.0026   Epoch: 16   Global Step: 208650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:11,380-Speed 3041.56 samples/sec   Loss 2.2724   LearningRate 0.0026   Epoch: 16   Global Step: 208660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:29:14,846-Speed 2954.60 samples/sec   Loss 2.2812   LearningRate 0.0026   Epoch: 16   Global Step: 208670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:18,174-Speed 3078.49 samples/sec   Loss 2.3158   LearningRate 0.0026   Epoch: 16   Global Step: 208680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:21,514-Speed 3066.66 samples/sec   Loss 2.2929   LearningRate 0.0026   Epoch: 16   Global Step: 208690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:24,879-Speed 3043.21 samples/sec   Loss 2.3412   LearningRate 0.0026   Epoch: 16   Global Step: 208700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:28,310-Speed 2986.48 samples/sec   Loss 2.2862   LearningRate 0.0026   Epoch: 16   Global Step: 208710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:31,681-Speed 3038.14 samples/sec   Loss 2.3261   LearningRate 0.0026   Epoch: 16   Global Step: 208720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:35,110-Speed 2987.13 samples/sec   Loss 2.3353   LearningRate 0.0026   Epoch: 16   Global Step: 208730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:38,578-Speed 2954.24 samples/sec   Loss 2.2784   LearningRate 0.0026   Epoch: 16   Global Step: 208740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:41,995-Speed 2997.83 samples/sec   Loss 2.3065   LearningRate 0.0026   Epoch: 16   Global Step: 208750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:45,364-Speed 3040.06 samples/sec   Loss 2.3101   LearningRate 0.0025   Epoch: 16   Global Step: 208760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:48,708-Speed 3063.10 samples/sec   Loss 2.3181   LearningRate 0.0025   Epoch: 16   Global Step: 208770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:29:52,025-Speed 3088.13 samples/sec   Loss 2.3130   LearningRate 0.0025   Epoch: 16   Global Step: 208780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:29:55,377-Speed 3056.15 samples/sec   Loss 2.2908   LearningRate 0.0025   Epoch: 16   Global Step: 208790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:29:58,819-Speed 2976.43 samples/sec   Loss 2.3229   LearningRate 0.0025   Epoch: 16   Global Step: 208800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:02,211-Speed 3019.99 samples/sec   Loss 2.3352   LearningRate 0.0025   Epoch: 16   Global Step: 208810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:05,620-Speed 3004.72 samples/sec   Loss 2.3490   LearningRate 0.0025   Epoch: 16   Global Step: 208820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:09,079-Speed 2961.15 samples/sec   Loss 2.3408   LearningRate 0.0025   Epoch: 16   Global Step: 208830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:12,517-Speed 2979.22 samples/sec   Loss 2.3121   LearningRate 0.0025   Epoch: 16   Global Step: 208840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:15,942-Speed 2990.32 samples/sec   Loss 2.2678   LearningRate 0.0025   Epoch: 16   Global Step: 208850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:19,373-Speed 2985.69 samples/sec   Loss 2.3524   LearningRate 0.0025   Epoch: 16   Global Step: 208860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:22,820-Speed 2971.85 samples/sec   Loss 2.3777   LearningRate 0.0025   Epoch: 16   Global Step: 208870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:26,293-Speed 2949.28 samples/sec   Loss 2.3144   LearningRate 0.0025   Epoch: 16   Global Step: 208880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:30:29,716-Speed 2992.75 samples/sec   Loss 2.2908   LearningRate 0.0025   Epoch: 16   Global Step: 208890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:30:33,113-Speed 3014.95 samples/sec   Loss 2.3536   LearningRate 0.0025   Epoch: 16   Global Step: 208900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:36,547-Speed 2982.92 samples/sec   Loss 2.2824   LearningRate 0.0025   Epoch: 16   Global Step: 208910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:40,018-Speed 2951.22 samples/sec   Loss 2.2716   LearningRate 0.0025   Epoch: 16   Global Step: 208920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:43,424-Speed 3006.45 samples/sec   Loss 2.2116   LearningRate 0.0025   Epoch: 16   Global Step: 208930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:46,852-Speed 2988.29 samples/sec   Loss 2.3290   LearningRate 0.0025   Epoch: 16   Global Step: 208940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:50,254-Speed 3010.63 samples/sec   Loss 2.3558   LearningRate 0.0025   Epoch: 16   Global Step: 208950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:53,696-Speed 2975.95 samples/sec   Loss 2.3437   LearningRate 0.0025   Epoch: 16   Global Step: 208960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:30:57,109-Speed 3001.71 samples/sec   Loss 2.3111   LearningRate 0.0025   Epoch: 16   Global Step: 208970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:31:00,439-Speed 3075.94 samples/sec   Loss 2.3555   LearningRate 0.0025   Epoch: 16   Global Step: 208980   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:31:03,916-Speed 2945.65 samples/sec   Loss 2.3419   LearningRate 0.0025   Epoch: 16   Global Step: 208990   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:31:07,309-Speed 3018.99 samples/sec   Loss 2.3416   LearningRate 0.0025   Epoch: 16   Global Step: 209000   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:31:10,718-Speed 3003.99 samples/sec   Loss 2.3216   LearningRate 0.0025   Epoch: 16   Global Step: 209010   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:31:14,120-Speed 3011.34 samples/sec   Loss 2.3330   LearningRate 0.0025   Epoch: 16   Global Step: 209020   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:31:17,532-Speed 3001.86 samples/sec   Loss 2.3043   LearningRate 0.0025   Epoch: 16   Global Step: 209030   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:31:20,908-Speed 3033.81 samples/sec   Loss 2.2289   LearningRate 0.0025   Epoch: 16   Global Step: 209040   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:31:24,288-Speed 3030.43 samples/sec   Loss 2.3023   LearningRate 0.0025   Epoch: 16   Global Step: 209050   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:31:27,673-Speed 3025.76 samples/sec   Loss 2.3286   LearningRate 0.0025   Epoch: 16   Global Step: 209060   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:31:31,159-Speed 2938.41 samples/sec   Loss 2.3185   LearningRate 0.0025   Epoch: 16   Global Step: 209070   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:31:34,596-Speed 2980.86 samples/sec   Loss 2.3440   LearningRate 0.0025   Epoch: 16   Global Step: 209080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:31:37,932-Speed 3070.20 samples/sec   Loss 2.2980   LearningRate 0.0025   Epoch: 16   Global Step: 209090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:31:41,281-Speed 3058.03 samples/sec   Loss 2.2879   LearningRate 0.0025   Epoch: 16   Global Step: 209100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:31:44,702-Speed 2994.54 samples/sec   Loss 2.2418   LearningRate 0.0025   Epoch: 16   Global Step: 209110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:31:48,097-Speed 3017.11 samples/sec   Loss 2.3010   LearningRate 0.0025   Epoch: 16   Global Step: 209120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:31:51,486-Speed 3023.02 samples/sec   Loss 2.2659   LearningRate 0.0025   Epoch: 16   Global Step: 209130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:31:54,930-Speed 2973.19 samples/sec   Loss 2.2618   LearningRate 0.0025   Epoch: 16   Global Step: 209140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:31:58,287-Speed 3051.92 samples/sec   Loss 2.3654   LearningRate 0.0025   Epoch: 16   Global Step: 209150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:32:01,607-Speed 3085.67 samples/sec   Loss 2.3057   LearningRate 0.0025   Epoch: 16   Global Step: 209160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:32:04,972-Speed 3043.55 samples/sec   Loss 2.3243   LearningRate 0.0025   Epoch: 16   Global Step: 209170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:32:08,357-Speed 3025.65 samples/sec   Loss 2.2817   LearningRate 0.0025   Epoch: 16   Global Step: 209180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:32:11,646-Speed 3116.69 samples/sec   Loss 2.3201   LearningRate 0.0025   Epoch: 16   Global Step: 209190   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:32:15,093-Speed 2971.53 samples/sec   Loss 2.2496   LearningRate 0.0025   Epoch: 16   Global Step: 209200   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:32:18,524-Speed 2985.33 samples/sec   Loss 2.3684   LearningRate 0.0025   Epoch: 16   Global Step: 209210   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:32:21,913-Speed 3022.79 samples/sec   Loss 2.2344   LearningRate 0.0025   Epoch: 16   Global Step: 209220   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:32:25,255-Speed 3064.63 samples/sec   Loss 2.2976   LearningRate 0.0025   Epoch: 16   Global Step: 209230   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:32:28,675-Speed 2995.38 samples/sec   Loss 2.3030   LearningRate 0.0025   Epoch: 16   Global Step: 209240   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:32:32,063-Speed 3022.93 samples/sec   Loss 2.3307   LearningRate 0.0025   Epoch: 16   Global Step: 209250   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:32:35,466-Speed 3009.93 samples/sec   Loss 2.2692   LearningRate 0.0025   Epoch: 16   Global Step: 209260   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:32:38,939-Speed 2949.46 samples/sec   Loss 2.3305   LearningRate 0.0025   Epoch: 16   Global Step: 209270   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:32:42,399-Speed 2960.46 samples/sec   Loss 2.3525   LearningRate 0.0025   Epoch: 16   Global Step: 209280   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 21:32:45,842-Speed 2974.90 samples/sec   Loss 2.2109   LearningRate 0.0025   Epoch: 16   Global Step: 209290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:32:49,200-Speed 3050.39 samples/sec   Loss 2.3172   LearningRate 0.0025   Epoch: 16   Global Step: 209300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:32:52,566-Speed 3043.30 samples/sec   Loss 2.2835   LearningRate 0.0025   Epoch: 16   Global Step: 209310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:32:55,922-Speed 3052.00 samples/sec   Loss 2.2641   LearningRate 0.0025   Epoch: 16   Global Step: 209320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:32:59,336-Speed 2999.62 samples/sec   Loss 2.2925   LearningRate 0.0025   Epoch: 16   Global Step: 209330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:33:02,766-Speed 2986.53 samples/sec   Loss 2.2995   LearningRate 0.0025   Epoch: 16   Global Step: 209340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:33:06,090-Speed 3081.28 samples/sec   Loss 2.2850   LearningRate 0.0025   Epoch: 16   Global Step: 209350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:33:09,450-Speed 3048.63 samples/sec   Loss 2.2237   LearningRate 0.0025   Epoch: 16   Global Step: 209360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:33:12,790-Speed 3066.97 samples/sec   Loss 2.2793   LearningRate 0.0025   Epoch: 16   Global Step: 209370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:33:16,095-Speed 3098.55 samples/sec   Loss 2.3470   LearningRate 0.0025   Epoch: 16   Global Step: 209380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:33:19,424-Speed 3077.24 samples/sec   Loss 2.2479   LearningRate 0.0025   Epoch: 16   Global Step: 209390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:33:22,820-Speed 3016.56 samples/sec   Loss 2.2936   LearningRate 0.0025   Epoch: 16   Global Step: 209400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:33:26,223-Speed 3009.14 samples/sec   Loss 2.3102   LearningRate 0.0025   Epoch: 16   Global Step: 209410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:33:29,586-Speed 3046.26 samples/sec   Loss 2.2840   LearningRate 0.0025   Epoch: 16   Global Step: 209420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:33:32,920-Speed 3072.18 samples/sec   Loss 2.3316   LearningRate 0.0025   Epoch: 16   Global Step: 209430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:33:36,256-Speed 3070.65 samples/sec   Loss 2.3414   LearningRate 0.0025   Epoch: 16   Global Step: 209440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:33:39,600-Speed 3063.32 samples/sec   Loss 2.2851   LearningRate 0.0025   Epoch: 16   Global Step: 209450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:33:42,953-Speed 3054.24 samples/sec   Loss 2.2724   LearningRate 0.0025   Epoch: 16   Global Step: 209460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:33:46,403-Speed 2968.62 samples/sec   Loss 2.3322   LearningRate 0.0025   Epoch: 16   Global Step: 209470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:33:49,776-Speed 3037.04 samples/sec   Loss 2.2702   LearningRate 0.0025   Epoch: 16   Global Step: 209480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:33:53,168-Speed 3020.15 samples/sec   Loss 2.2794   LearningRate 0.0025   Epoch: 16   Global Step: 209490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:33:56,499-Speed 3074.45 samples/sec   Loss 2.2760   LearningRate 0.0025   Epoch: 16   Global Step: 209500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:33:59,869-Speed 3039.89 samples/sec   Loss 2.3888   LearningRate 0.0025   Epoch: 16   Global Step: 209510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:34:03,192-Speed 3082.69 samples/sec   Loss 2.2480   LearningRate 0.0025   Epoch: 16   Global Step: 209520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:34:06,601-Speed 3004.70 samples/sec   Loss 2.2816   LearningRate 0.0025   Epoch: 16   Global Step: 209530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:34:10,017-Speed 2997.94 samples/sec   Loss 2.2955   LearningRate 0.0024   Epoch: 16   Global Step: 209540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:34:13,390-Speed 3037.16 samples/sec   Loss 2.2628   LearningRate 0.0024   Epoch: 16   Global Step: 209550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:34:16,787-Speed 3014.91 samples/sec   Loss 2.3690   LearningRate 0.0024   Epoch: 16   Global Step: 209560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:34:20,225-Speed 2979.34 samples/sec   Loss 2.3630   LearningRate 0.0024   Epoch: 16   Global Step: 209570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:34:23,576-Speed 3056.75 samples/sec   Loss 2.2885   LearningRate 0.0024   Epoch: 16   Global Step: 209580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:34:26,895-Speed 3085.47 samples/sec   Loss 2.2211   LearningRate 0.0024   Epoch: 16   Global Step: 209590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:34:30,213-Speed 3088.68 samples/sec   Loss 2.2795   LearningRate 0.0024   Epoch: 16   Global Step: 209600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:34:33,727-Speed 2914.89 samples/sec   Loss 2.2761   LearningRate 0.0024   Epoch: 16   Global Step: 209610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:34:37,045-Speed 3087.72 samples/sec   Loss 2.3358   LearningRate 0.0024   Epoch: 16   Global Step: 209620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:34:40,439-Speed 3017.72 samples/sec   Loss 2.2335   LearningRate 0.0024   Epoch: 16   Global Step: 209630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:34:43,868-Speed 2987.32 samples/sec   Loss 2.2770   LearningRate 0.0024   Epoch: 16   Global Step: 209640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:34:47,315-Speed 2971.49 samples/sec   Loss 2.2846   LearningRate 0.0024   Epoch: 16   Global Step: 209650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:34:50,660-Speed 3062.24 samples/sec   Loss 2.2760   LearningRate 0.0024   Epoch: 16   Global Step: 209660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:34:54,124-Speed 2956.70 samples/sec   Loss 2.3774   LearningRate 0.0024   Epoch: 16   Global Step: 209670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:34:57,451-Speed 3078.15 samples/sec   Loss 2.2790   LearningRate 0.0024   Epoch: 16   Global Step: 209680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:35:00,792-Speed 3066.20 samples/sec   Loss 2.3126   LearningRate 0.0024   Epoch: 16   Global Step: 209690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:35:04,234-Speed 2975.62 samples/sec   Loss 2.2751   LearningRate 0.0024   Epoch: 16   Global Step: 209700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:35:07,631-Speed 3016.01 samples/sec   Loss 2.2806   LearningRate 0.0024   Epoch: 16   Global Step: 209710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:35:10,998-Speed 3042.22 samples/sec   Loss 2.2649   LearningRate 0.0024   Epoch: 16   Global Step: 209720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:35:14,452-Speed 2965.42 samples/sec   Loss 2.2647   LearningRate 0.0024   Epoch: 16   Global Step: 209730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:35:17,854-Speed 3010.18 samples/sec   Loss 2.2817   LearningRate 0.0024   Epoch: 16   Global Step: 209740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:35:21,296-Speed 2976.14 samples/sec   Loss 2.2801   LearningRate 0.0024   Epoch: 16   Global Step: 209750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:35:24,685-Speed 3022.14 samples/sec   Loss 2.3031   LearningRate 0.0024   Epoch: 16   Global Step: 209760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:35:28,134-Speed 2970.19 samples/sec   Loss 2.3219   LearningRate 0.0024   Epoch: 16   Global Step: 209770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:35:31,557-Speed 2992.42 samples/sec   Loss 2.3039   LearningRate 0.0024   Epoch: 16   Global Step: 209780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:35:34,934-Speed 3033.54 samples/sec   Loss 2.3187   LearningRate 0.0024   Epoch: 16   Global Step: 209790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:35:38,405-Speed 2951.32 samples/sec   Loss 2.2862   LearningRate 0.0024   Epoch: 16   Global Step: 209800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:35:41,878-Speed 2949.37 samples/sec   Loss 2.3109   LearningRate 0.0024   Epoch: 16   Global Step: 209810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:35:45,239-Speed 3046.78 samples/sec   Loss 2.3156   LearningRate 0.0024   Epoch: 16   Global Step: 209820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:35:48,606-Speed 3041.82 samples/sec   Loss 2.2619   LearningRate 0.0024   Epoch: 16   Global Step: 209830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:35:51,907-Speed 3103.18 samples/sec   Loss 2.2222   LearningRate 0.0024   Epoch: 16   Global Step: 209840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:35:55,359-Speed 2967.32 samples/sec   Loss 2.2971   LearningRate 0.0024   Epoch: 16   Global Step: 209850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:35:58,707-Speed 3059.55 samples/sec   Loss 2.3163   LearningRate 0.0024   Epoch: 16   Global Step: 209860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:36:02,037-Speed 3076.19 samples/sec   Loss 2.2297   LearningRate 0.0024   Epoch: 16   Global Step: 209870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:36:05,452-Speed 2999.30 samples/sec   Loss 2.2872   LearningRate 0.0024   Epoch: 16   Global Step: 209880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:36:08,765-Speed 3091.46 samples/sec   Loss 2.3100   LearningRate 0.0024   Epoch: 16   Global Step: 209890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:36:12,133-Speed 3041.46 samples/sec   Loss 2.3064   LearningRate 0.0024   Epoch: 16   Global Step: 209900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:36:15,477-Speed 3062.65 samples/sec   Loss 2.2554   LearningRate 0.0024   Epoch: 16   Global Step: 209910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:36:18,872-Speed 3017.31 samples/sec   Loss 2.2752   LearningRate 0.0024   Epoch: 16   Global Step: 209920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:36:22,301-Speed 2987.50 samples/sec   Loss 2.2744   LearningRate 0.0024   Epoch: 16   Global Step: 209930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:36:25,644-Speed 3064.03 samples/sec   Loss 2.2958   LearningRate 0.0024   Epoch: 16   Global Step: 209940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:36:29,113-Speed 2952.92 samples/sec   Loss 2.2542   LearningRate 0.0024   Epoch: 16   Global Step: 209950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:36:32,483-Speed 3039.38 samples/sec   Loss 2.4116   LearningRate 0.0024   Epoch: 16   Global Step: 209960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:36:35,917-Speed 2983.35 samples/sec   Loss 2.3086   LearningRate 0.0024   Epoch: 16   Global Step: 209970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:36:39,273-Speed 3051.72 samples/sec   Loss 2.2390   LearningRate 0.0024   Epoch: 16   Global Step: 209980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:36:42,648-Speed 3034.74 samples/sec   Loss 2.2821   LearningRate 0.0024   Epoch: 16   Global Step: 209990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:36:46,059-Speed 3002.74 samples/sec   Loss 2.3332   LearningRate 0.0024   Epoch: 16   Global Step: 210000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:36:49,467-Speed 3006.05 samples/sec   Loss 2.2258   LearningRate 0.0024   Epoch: 16   Global Step: 210010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:36:52,922-Speed 2966.00 samples/sec   Loss 2.3376   LearningRate 0.0024   Epoch: 16   Global Step: 210020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:36:56,387-Speed 2955.69 samples/sec   Loss 2.3270   LearningRate 0.0024   Epoch: 16   Global Step: 210030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:36:59,732-Speed 3061.95 samples/sec   Loss 2.2910   LearningRate 0.0024   Epoch: 16   Global Step: 210040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:37:03,045-Speed 3092.50 samples/sec   Loss 2.2904   LearningRate 0.0024   Epoch: 16   Global Step: 210050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:37:06,413-Speed 3041.41 samples/sec   Loss 2.3492   LearningRate 0.0024   Epoch: 16   Global Step: 210060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:37:09,827-Speed 3000.66 samples/sec   Loss 2.2247   LearningRate 0.0024   Epoch: 16   Global Step: 210070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:37:13,215-Speed 3022.57 samples/sec   Loss 2.2525   LearningRate 0.0024   Epoch: 16   Global Step: 210080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:37:16,601-Speed 3024.85 samples/sec   Loss 2.3394   LearningRate 0.0024   Epoch: 16   Global Step: 210090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:37:19,958-Speed 3052.07 samples/sec   Loss 2.2666   LearningRate 0.0024   Epoch: 16   Global Step: 210100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:37:23,342-Speed 3026.67 samples/sec   Loss 2.2524   LearningRate 0.0024   Epoch: 16   Global Step: 210110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:37:26,671-Speed 3076.68 samples/sec   Loss 2.2678   LearningRate 0.0024   Epoch: 16   Global Step: 210120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:37:30,043-Speed 3038.04 samples/sec   Loss 2.2282   LearningRate 0.0024   Epoch: 16   Global Step: 210130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:37:33,370-Speed 3079.00 samples/sec   Loss 2.3883   LearningRate 0.0024   Epoch: 16   Global Step: 210140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:37:36,710-Speed 3065.96 samples/sec   Loss 2.2116   LearningRate 0.0024   Epoch: 16   Global Step: 210150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:37:40,144-Speed 2982.75 samples/sec   Loss 2.2703   LearningRate 0.0024   Epoch: 16   Global Step: 210160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:37:43,458-Speed 3091.00 samples/sec   Loss 2.2563   LearningRate 0.0024   Epoch: 16   Global Step: 210170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:37:46,845-Speed 3024.26 samples/sec   Loss 2.2729   LearningRate 0.0024   Epoch: 16   Global Step: 210180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:37:50,243-Speed 3014.60 samples/sec   Loss 2.3195   LearningRate 0.0024   Epoch: 16   Global Step: 210190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:37:53,589-Speed 3061.02 samples/sec   Loss 2.3106   LearningRate 0.0024   Epoch: 16   Global Step: 210200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:37:56,945-Speed 3051.73 samples/sec   Loss 2.2773   LearningRate 0.0024   Epoch: 16   Global Step: 210210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:38:00,422-Speed 2946.49 samples/sec   Loss 2.2645   LearningRate 0.0024   Epoch: 16   Global Step: 210220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:38:03,858-Speed 2981.32 samples/sec   Loss 2.2884   LearningRate 0.0024   Epoch: 16   Global Step: 210230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:38:07,213-Speed 3053.33 samples/sec   Loss 2.3418   LearningRate 0.0024   Epoch: 16   Global Step: 210240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:38:10,614-Speed 3011.21 samples/sec   Loss 2.2844   LearningRate 0.0024   Epoch: 16   Global Step: 210250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:38:14,039-Speed 2990.60 samples/sec   Loss 2.3132   LearningRate 0.0024   Epoch: 16   Global Step: 210260   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:38:17,422-Speed 3027.95 samples/sec   Loss 2.3569   LearningRate 0.0024   Epoch: 16   Global Step: 210270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:38:20,937-Speed 2914.25 samples/sec   Loss 2.2386   LearningRate 0.0024   Epoch: 16   Global Step: 210280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:38:24,325-Speed 3023.14 samples/sec   Loss 2.2230   LearningRate 0.0024   Epoch: 16   Global Step: 210290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:38:27,715-Speed 3021.78 samples/sec   Loss 2.2308   LearningRate 0.0024   Epoch: 16   Global Step: 210300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:38:31,089-Speed 3035.62 samples/sec   Loss 2.3209   LearningRate 0.0024   Epoch: 16   Global Step: 210310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:38:34,474-Speed 3026.58 samples/sec   Loss 2.2985   LearningRate 0.0024   Epoch: 16   Global Step: 210320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:38:37,841-Speed 3042.13 samples/sec   Loss 2.2476   LearningRate 0.0024   Epoch: 16   Global Step: 210330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:38:41,147-Speed 3098.21 samples/sec   Loss 2.3215   LearningRate 0.0023   Epoch: 16   Global Step: 210340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:38:44,543-Speed 3015.84 samples/sec   Loss 2.3237   LearningRate 0.0023   Epoch: 16   Global Step: 210350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:38:47,877-Speed 3072.00 samples/sec   Loss 2.2829   LearningRate 0.0023   Epoch: 16   Global Step: 210360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:38:51,231-Speed 3054.18 samples/sec   Loss 2.3153   LearningRate 0.0023   Epoch: 16   Global Step: 210370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:38:54,554-Speed 3082.80 samples/sec   Loss 2.3051   LearningRate 0.0023   Epoch: 16   Global Step: 210380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:38:57,929-Speed 3034.30 samples/sec   Loss 2.2511   LearningRate 0.0023   Epoch: 16   Global Step: 210390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:39:01,320-Speed 3021.47 samples/sec   Loss 2.2619   LearningRate 0.0023   Epoch: 16   Global Step: 210400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:39:04,731-Speed 3002.46 samples/sec   Loss 2.2575   LearningRate 0.0023   Epoch: 16   Global Step: 210410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:39:08,094-Speed 3045.87 samples/sec   Loss 2.1983   LearningRate 0.0023   Epoch: 16   Global Step: 210420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:39:11,464-Speed 3039.32 samples/sec   Loss 2.2999   LearningRate 0.0023   Epoch: 16   Global Step: 210430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:39:14,844-Speed 3030.53 samples/sec   Loss 2.2525   LearningRate 0.0023   Epoch: 16   Global Step: 210440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:39:18,202-Speed 3050.36 samples/sec   Loss 2.2922   LearningRate 0.0023   Epoch: 16   Global Step: 210450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:39:21,573-Speed 3038.43 samples/sec   Loss 2.3685   LearningRate 0.0023   Epoch: 16   Global Step: 210460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:39:24,931-Speed 3050.40 samples/sec   Loss 2.3153   LearningRate 0.0023   Epoch: 16   Global Step: 210470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:39:28,288-Speed 3051.54 samples/sec   Loss 2.2875   LearningRate 0.0023   Epoch: 16   Global Step: 210480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:39:31,733-Speed 2973.69 samples/sec   Loss 2.2527   LearningRate 0.0023   Epoch: 16   Global Step: 210490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:39:35,066-Speed 3072.51 samples/sec   Loss 2.3164   LearningRate 0.0023   Epoch: 16   Global Step: 210500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:39:38,400-Speed 3072.97 samples/sec   Loss 2.3225   LearningRate 0.0023   Epoch: 16   Global Step: 210510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:39:41,760-Speed 3048.76 samples/sec   Loss 2.3061   LearningRate 0.0023   Epoch: 16   Global Step: 210520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:39:45,158-Speed 3013.68 samples/sec   Loss 2.2870   LearningRate 0.0023   Epoch: 16   Global Step: 210530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:39:48,550-Speed 3019.47 samples/sec   Loss 2.3084   LearningRate 0.0023   Epoch: 16   Global Step: 210540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:39:51,931-Speed 3030.18 samples/sec   Loss 2.2467   LearningRate 0.0023   Epoch: 16   Global Step: 210550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:39:55,350-Speed 2996.30 samples/sec   Loss 2.2696   LearningRate 0.0023   Epoch: 16   Global Step: 210560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:39:58,804-Speed 2966.15 samples/sec   Loss 2.3175   LearningRate 0.0023   Epoch: 16   Global Step: 210570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:40:02,157-Speed 3054.44 samples/sec   Loss 2.2848   LearningRate 0.0023   Epoch: 16   Global Step: 210580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:40:05,563-Speed 3007.47 samples/sec   Loss 2.2600   LearningRate 0.0023   Epoch: 16   Global Step: 210590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:40:08,978-Speed 2999.53 samples/sec   Loss 2.2368   LearningRate 0.0023   Epoch: 16   Global Step: 210600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:40:12,360-Speed 3028.60 samples/sec   Loss 2.3678   LearningRate 0.0023   Epoch: 16   Global Step: 210610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:40:15,831-Speed 2951.22 samples/sec   Loss 2.2798   LearningRate 0.0023   Epoch: 16   Global Step: 210620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:40:19,141-Speed 3094.33 samples/sec   Loss 2.3189   LearningRate 0.0023   Epoch: 16   Global Step: 210630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:40:22,452-Speed 3093.46 samples/sec   Loss 2.2040   LearningRate 0.0023   Epoch: 16   Global Step: 210640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:40:25,789-Speed 3068.91 samples/sec   Loss 2.3079   LearningRate 0.0023   Epoch: 16   Global Step: 210650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:40:29,203-Speed 3001.15 samples/sec   Loss 2.2506   LearningRate 0.0023   Epoch: 16   Global Step: 210660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:40:32,529-Speed 3079.39 samples/sec   Loss 2.2258   LearningRate 0.0023   Epoch: 16   Global Step: 210670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:40:35,898-Speed 3039.81 samples/sec   Loss 2.2719   LearningRate 0.0023   Epoch: 16   Global Step: 210680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:40:39,238-Speed 3066.54 samples/sec   Loss 2.3959   LearningRate 0.0023   Epoch: 16   Global Step: 210690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:40:42,632-Speed 3018.39 samples/sec   Loss 2.2368   LearningRate 0.0023   Epoch: 16   Global Step: 210700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:40:45,969-Speed 3069.45 samples/sec   Loss 2.2568   LearningRate 0.0023   Epoch: 16   Global Step: 210710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:40:49,341-Speed 3037.54 samples/sec   Loss 2.2828   LearningRate 0.0023   Epoch: 16   Global Step: 210720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:40:52,786-Speed 2973.71 samples/sec   Loss 2.2338   LearningRate 0.0023   Epoch: 16   Global Step: 210730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:40:56,157-Speed 3038.07 samples/sec   Loss 2.2423   LearningRate 0.0023   Epoch: 16   Global Step: 210740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:40:59,628-Speed 2951.46 samples/sec   Loss 2.2721   LearningRate 0.0023   Epoch: 16   Global Step: 210750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:41:02,979-Speed 3055.88 samples/sec   Loss 2.2736   LearningRate 0.0023   Epoch: 16   Global Step: 210760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:41:06,342-Speed 3045.57 samples/sec   Loss 2.2416   LearningRate 0.0023   Epoch: 16   Global Step: 210770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:41:09,643-Speed 3103.57 samples/sec   Loss 2.3273   LearningRate 0.0023   Epoch: 16   Global Step: 210780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:41:13,009-Speed 3042.98 samples/sec   Loss 2.3388   LearningRate 0.0023   Epoch: 16   Global Step: 210790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:41:16,448-Speed 2978.00 samples/sec   Loss 2.2494   LearningRate 0.0023   Epoch: 16   Global Step: 210800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:41:19,803-Speed 3053.03 samples/sec   Loss 2.2549   LearningRate 0.0023   Epoch: 16   Global Step: 210810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:41:23,171-Speed 3041.12 samples/sec   Loss 2.2701   LearningRate 0.0023   Epoch: 16   Global Step: 210820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:41:26,500-Speed 3077.22 samples/sec   Loss 2.2465   LearningRate 0.0023   Epoch: 16   Global Step: 210830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:41:29,864-Speed 3045.20 samples/sec   Loss 2.3468   LearningRate 0.0023   Epoch: 16   Global Step: 210840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:41:33,232-Speed 3040.53 samples/sec   Loss 2.2576   LearningRate 0.0023   Epoch: 16   Global Step: 210850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:41:36,659-Speed 2989.22 samples/sec   Loss 2.2554   LearningRate 0.0023   Epoch: 16   Global Step: 210860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:41:40,031-Speed 3037.35 samples/sec   Loss 2.2754   LearningRate 0.0023   Epoch: 16   Global Step: 210870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:41:43,457-Speed 2989.69 samples/sec   Loss 2.3200   LearningRate 0.0023   Epoch: 16   Global Step: 210880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:41:46,815-Speed 3049.67 samples/sec   Loss 2.2562   LearningRate 0.0023   Epoch: 16   Global Step: 210890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:41:50,183-Speed 3041.92 samples/sec   Loss 2.2744   LearningRate 0.0023   Epoch: 16   Global Step: 210900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:41:53,558-Speed 3034.88 samples/sec   Loss 2.3121   LearningRate 0.0023   Epoch: 16   Global Step: 210910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:41:56,911-Speed 3054.76 samples/sec   Loss 2.2301   LearningRate 0.0023   Epoch: 16   Global Step: 210920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:42:00,288-Speed 3033.53 samples/sec   Loss 2.3178   LearningRate 0.0023   Epoch: 16   Global Step: 210930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:42:03,681-Speed 3018.60 samples/sec   Loss 2.3466   LearningRate 0.0023   Epoch: 16   Global Step: 210940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:42:07,090-Speed 3004.64 samples/sec   Loss 2.2567   LearningRate 0.0023   Epoch: 16   Global Step: 210950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:42:10,423-Speed 3073.45 samples/sec   Loss 2.2783   LearningRate 0.0023   Epoch: 16   Global Step: 210960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:42:13,890-Speed 2954.10 samples/sec   Loss 2.3295   LearningRate 0.0023   Epoch: 16   Global Step: 210970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:42:17,261-Speed 3038.61 samples/sec   Loss 2.2386   LearningRate 0.0023   Epoch: 16   Global Step: 210980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:42:20,585-Speed 3081.52 samples/sec   Loss 2.2818   LearningRate 0.0023   Epoch: 16   Global Step: 210990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:42:24,039-Speed 2965.45 samples/sec   Loss 2.2272   LearningRate 0.0023   Epoch: 16   Global Step: 211000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:42:27,417-Speed 3031.97 samples/sec   Loss 2.3156   LearningRate 0.0023   Epoch: 16   Global Step: 211010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:42:30,854-Speed 2979.96 samples/sec   Loss 2.3141   LearningRate 0.0023   Epoch: 16   Global Step: 211020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:42:34,212-Speed 3050.38 samples/sec   Loss 2.2042   LearningRate 0.0023   Epoch: 16   Global Step: 211030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:42:37,622-Speed 3004.17 samples/sec   Loss 2.3451   LearningRate 0.0023   Epoch: 16   Global Step: 211040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:42:40,966-Speed 3062.69 samples/sec   Loss 2.3103   LearningRate 0.0023   Epoch: 16   Global Step: 211050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:42:44,340-Speed 3036.11 samples/sec   Loss 2.2138   LearningRate 0.0023   Epoch: 16   Global Step: 211060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:42:47,742-Speed 3011.29 samples/sec   Loss 2.2193   LearningRate 0.0023   Epoch: 16   Global Step: 211070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:42:51,056-Speed 3090.94 samples/sec   Loss 2.2910   LearningRate 0.0023   Epoch: 16   Global Step: 211080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:42:54,408-Speed 3055.06 samples/sec   Loss 2.3557   LearningRate 0.0023   Epoch: 16   Global Step: 211090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:42:57,725-Speed 3089.20 samples/sec   Loss 2.2670   LearningRate 0.0023   Epoch: 16   Global Step: 211100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:43:01,073-Speed 3058.72 samples/sec   Loss 2.2669   LearningRate 0.0023   Epoch: 16   Global Step: 211110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:43:04,448-Speed 3035.16 samples/sec   Loss 2.2525   LearningRate 0.0023   Epoch: 16   Global Step: 211120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:43:07,842-Speed 3017.73 samples/sec   Loss 2.3168   LearningRate 0.0023   Epoch: 16   Global Step: 211130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:43:11,165-Speed 3082.39 samples/sec   Loss 2.2866   LearningRate 0.0023   Epoch: 16   Global Step: 211140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:43:14,785-Speed 2829.24 samples/sec   Loss 2.3285   LearningRate 0.0023   Epoch: 16   Global Step: 211150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:43:45,436-Speed 334.10 samples/sec   Loss 2.0658   LearningRate 0.0022   Epoch: 17   Global Step: 211160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:43:48,973-Speed 2896.92 samples/sec   Loss 1.4357   LearningRate 0.0022   Epoch: 17   Global Step: 211170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:43:52,544-Speed 2868.51 samples/sec   Loss 1.4037   LearningRate 0.0022   Epoch: 17   Global Step: 211180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:43:55,931-Speed 3025.26 samples/sec   Loss 1.4843   LearningRate 0.0022   Epoch: 17   Global Step: 211190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 21:43:59,309-Speed 3032.41 samples/sec   Loss 1.3992   LearningRate 0.0022   Epoch: 17   Global Step: 211200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:44:02,736-Speed 2988.74 samples/sec   Loss 1.4171   LearningRate 0.0022   Epoch: 17   Global Step: 211210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:44:06,242-Speed 2921.52 samples/sec   Loss 1.4186   LearningRate 0.0022   Epoch: 17   Global Step: 211220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:44:09,689-Speed 2971.29 samples/sec   Loss 1.3853   LearningRate 0.0022   Epoch: 17   Global Step: 211230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:44:13,068-Speed 3032.15 samples/sec   Loss 1.4291   LearningRate 0.0022   Epoch: 17   Global Step: 211240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:44:16,393-Speed 3080.15 samples/sec   Loss 1.4220   LearningRate 0.0022   Epoch: 17   Global Step: 211250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:44:19,891-Speed 2928.69 samples/sec   Loss 1.4336   LearningRate 0.0022   Epoch: 17   Global Step: 211260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:44:23,231-Speed 3066.80 samples/sec   Loss 1.3711   LearningRate 0.0022   Epoch: 17   Global Step: 211270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:44:26,646-Speed 2998.71 samples/sec   Loss 1.3598   LearningRate 0.0022   Epoch: 17   Global Step: 211280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:44:30,023-Speed 3033.74 samples/sec   Loss 1.3697   LearningRate 0.0022   Epoch: 17   Global Step: 211290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:44:33,427-Speed 3009.33 samples/sec   Loss 1.3671   LearningRate 0.0022   Epoch: 17   Global Step: 211300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:44:36,776-Speed 3057.92 samples/sec   Loss 1.4721   LearningRate 0.0022   Epoch: 17   Global Step: 211310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:44:40,252-Speed 2947.22 samples/sec   Loss 1.3738   LearningRate 0.0022   Epoch: 17   Global Step: 211320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:44:43,624-Speed 3037.78 samples/sec   Loss 1.4142   LearningRate 0.0022   Epoch: 17   Global Step: 211330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:44:47,138-Speed 2914.29 samples/sec   Loss 1.4376   LearningRate 0.0022   Epoch: 17   Global Step: 211340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 21:44:50,571-Speed 2984.36 samples/sec   Loss 1.3867   LearningRate 0.0022   Epoch: 17   Global Step: 211350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:44:54,154-Speed 2858.32 samples/sec   Loss 1.4066   LearningRate 0.0022   Epoch: 17   Global Step: 211360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:44:57,660-Speed 2922.19 samples/sec   Loss 1.4205   LearningRate 0.0022   Epoch: 17   Global Step: 211370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:45:01,069-Speed 3004.09 samples/sec   Loss 1.3549   LearningRate 0.0022   Epoch: 17   Global Step: 211380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:45:04,550-Speed 2943.26 samples/sec   Loss 1.4018   LearningRate 0.0022   Epoch: 17   Global Step: 211390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 21:45:07,962-Speed 3002.45 samples/sec   Loss 1.4148   LearningRate 0.0022   Epoch: 17   Global Step: 211400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:45:11,386-Speed 2991.69 samples/sec   Loss 1.3872   LearningRate 0.0022   Epoch: 17   Global Step: 211410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:45:14,805-Speed 2995.46 samples/sec   Loss 1.4436   LearningRate 0.0022   Epoch: 17   Global Step: 211420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:45:18,236-Speed 2986.10 samples/sec   Loss 1.4141   LearningRate 0.0022   Epoch: 17   Global Step: 211430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:45:21,710-Speed 2948.13 samples/sec   Loss 1.4034   LearningRate 0.0022   Epoch: 17   Global Step: 211440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:45:25,105-Speed 3017.31 samples/sec   Loss 1.4596   LearningRate 0.0022   Epoch: 17   Global Step: 211450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 21:45:28,482-Speed 3033.33 samples/sec   Loss 1.4498   LearningRate 0.0022   Epoch: 17   Global Step: 211460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 21:45:31,939-Speed 2963.23 samples/sec   Loss 1.4937   LearningRate 0.0022   Epoch: 17   Global Step: 211470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:45:35,356-Speed 2998.22 samples/sec   Loss 1.4180   LearningRate 0.0022   Epoch: 17   Global Step: 211480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:45:38,797-Speed 2976.07 samples/sec   Loss 1.4269   LearningRate 0.0022   Epoch: 17   Global Step: 211490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:45:42,136-Speed 3067.79 samples/sec   Loss 1.4403   LearningRate 0.0022   Epoch: 17   Global Step: 211500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:45:45,533-Speed 3015.98 samples/sec   Loss 1.4266   LearningRate 0.0022   Epoch: 17   Global Step: 211510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:45:48,990-Speed 2962.89 samples/sec   Loss 1.4040   LearningRate 0.0022   Epoch: 17   Global Step: 211520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:45:52,414-Speed 2990.65 samples/sec   Loss 1.4824   LearningRate 0.0022   Epoch: 17   Global Step: 211530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:45:55,820-Speed 3007.94 samples/sec   Loss 1.4414   LearningRate 0.0022   Epoch: 17   Global Step: 211540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:45:59,315-Speed 2930.90 samples/sec   Loss 1.4788   LearningRate 0.0022   Epoch: 17   Global Step: 211550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:46:02,745-Speed 2985.54 samples/sec   Loss 1.4205   LearningRate 0.0022   Epoch: 17   Global Step: 211560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:46:06,173-Speed 2988.65 samples/sec   Loss 1.4223   LearningRate 0.0022   Epoch: 17   Global Step: 211570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:46:09,577-Speed 3008.66 samples/sec   Loss 1.4177   LearningRate 0.0022   Epoch: 17   Global Step: 211580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:46:13,009-Speed 2984.31 samples/sec   Loss 1.4466   LearningRate 0.0022   Epoch: 17   Global Step: 211590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:46:16,446-Speed 2980.52 samples/sec   Loss 1.3622   LearningRate 0.0022   Epoch: 17   Global Step: 211600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:46:19,906-Speed 2960.31 samples/sec   Loss 1.4649   LearningRate 0.0022   Epoch: 17   Global Step: 211610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:46:23,281-Speed 3035.33 samples/sec   Loss 1.4361   LearningRate 0.0022   Epoch: 17   Global Step: 211620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:46:26,744-Speed 2957.50 samples/sec   Loss 1.4265   LearningRate 0.0022   Epoch: 17   Global Step: 211630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:46:30,182-Speed 2979.50 samples/sec   Loss 1.4643   LearningRate 0.0022   Epoch: 17   Global Step: 211640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:46:33,583-Speed 3011.94 samples/sec   Loss 1.4038   LearningRate 0.0022   Epoch: 17   Global Step: 211650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:46:36,966-Speed 3028.25 samples/sec   Loss 1.4841   LearningRate 0.0022   Epoch: 17   Global Step: 211660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:46:40,348-Speed 3027.98 samples/sec   Loss 1.4484   LearningRate 0.0022   Epoch: 17   Global Step: 211670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:46:43,745-Speed 3016.06 samples/sec   Loss 1.4986   LearningRate 0.0022   Epoch: 17   Global Step: 211680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:46:47,143-Speed 3013.62 samples/sec   Loss 1.4406   LearningRate 0.0022   Epoch: 17   Global Step: 211690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:46:50,531-Speed 3023.43 samples/sec   Loss 1.4479   LearningRate 0.0022   Epoch: 17   Global Step: 211700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:46:53,989-Speed 2962.04 samples/sec   Loss 1.4649   LearningRate 0.0022   Epoch: 17   Global Step: 211710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:46:57,418-Speed 2987.29 samples/sec   Loss 1.4524   LearningRate 0.0022   Epoch: 17   Global Step: 211720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:47:00,785-Speed 3042.61 samples/sec   Loss 1.3912   LearningRate 0.0022   Epoch: 17   Global Step: 211730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:47:04,187-Speed 3011.09 samples/sec   Loss 1.4792   LearningRate 0.0022   Epoch: 17   Global Step: 211740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:47:07,652-Speed 2955.54 samples/sec   Loss 1.4871   LearningRate 0.0022   Epoch: 17   Global Step: 211750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:47:11,022-Speed 3039.82 samples/sec   Loss 1.4564   LearningRate 0.0022   Epoch: 17   Global Step: 211760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:47:14,420-Speed 3014.08 samples/sec   Loss 1.4311   LearningRate 0.0022   Epoch: 17   Global Step: 211770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:47:17,824-Speed 3009.83 samples/sec   Loss 1.3999   LearningRate 0.0022   Epoch: 17   Global Step: 211780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:47:21,147-Speed 3082.22 samples/sec   Loss 1.4012   LearningRate 0.0022   Epoch: 17   Global Step: 211790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:47:24,485-Speed 3069.17 samples/sec   Loss 1.4676   LearningRate 0.0022   Epoch: 17   Global Step: 211800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:47:27,881-Speed 3015.18 samples/sec   Loss 1.4008   LearningRate 0.0022   Epoch: 17   Global Step: 211810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:47:31,290-Speed 3005.35 samples/sec   Loss 1.5171   LearningRate 0.0022   Epoch: 17   Global Step: 211820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:47:34,605-Speed 3089.78 samples/sec   Loss 1.4681   LearningRate 0.0022   Epoch: 17   Global Step: 211830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:47:38,027-Speed 2992.86 samples/sec   Loss 1.4559   LearningRate 0.0022   Epoch: 17   Global Step: 211840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:47:41,421-Speed 3017.91 samples/sec   Loss 1.4363   LearningRate 0.0022   Epoch: 17   Global Step: 211850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:47:44,834-Speed 3001.68 samples/sec   Loss 1.4142   LearningRate 0.0022   Epoch: 17   Global Step: 211860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:47:48,270-Speed 2980.07 samples/sec   Loss 1.4697   LearningRate 0.0022   Epoch: 17   Global Step: 211870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:47:51,681-Speed 3003.28 samples/sec   Loss 1.4595   LearningRate 0.0022   Epoch: 17   Global Step: 211880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:47:55,062-Speed 3029.64 samples/sec   Loss 1.4453   LearningRate 0.0022   Epoch: 17   Global Step: 211890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:47:58,487-Speed 2990.49 samples/sec   Loss 1.4320   LearningRate 0.0022   Epoch: 17   Global Step: 211900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:48:01,926-Speed 2978.49 samples/sec   Loss 1.5065   LearningRate 0.0022   Epoch: 17   Global Step: 211910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:48:05,318-Speed 3020.47 samples/sec   Loss 1.4193   LearningRate 0.0022   Epoch: 17   Global Step: 211920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:48:08,695-Speed 3032.76 samples/sec   Loss 1.4561   LearningRate 0.0022   Epoch: 17   Global Step: 211930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:48:12,079-Speed 3026.98 samples/sec   Loss 1.4572   LearningRate 0.0022   Epoch: 17   Global Step: 211940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:48:15,463-Speed 3026.77 samples/sec   Loss 1.4904   LearningRate 0.0022   Epoch: 17   Global Step: 211950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:48:18,897-Speed 2982.15 samples/sec   Loss 1.4256   LearningRate 0.0022   Epoch: 17   Global Step: 211960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:48:22,246-Speed 3059.09 samples/sec   Loss 1.4072   LearningRate 0.0022   Epoch: 17   Global Step: 211970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:48:25,660-Speed 3000.20 samples/sec   Loss 1.4895   LearningRate 0.0022   Epoch: 17   Global Step: 211980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:48:29,112-Speed 2967.30 samples/sec   Loss 1.3886   LearningRate 0.0022   Epoch: 17   Global Step: 211990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:48:32,561-Speed 2969.97 samples/sec   Loss 1.4383   LearningRate 0.0021   Epoch: 17   Global Step: 212000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:48:35,936-Speed 3035.21 samples/sec   Loss 1.4755   LearningRate 0.0021   Epoch: 17   Global Step: 212010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:48:39,318-Speed 3028.37 samples/sec   Loss 1.4513   LearningRate 0.0021   Epoch: 17   Global Step: 212020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:48:42,695-Speed 3033.86 samples/sec   Loss 1.4532   LearningRate 0.0021   Epoch: 17   Global Step: 212030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:48:46,067-Speed 3037.08 samples/sec   Loss 1.4027   LearningRate 0.0021   Epoch: 17   Global Step: 212040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 21:48:49,469-Speed 3010.94 samples/sec   Loss 1.4738   LearningRate 0.0021   Epoch: 17   Global Step: 212050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:48:52,885-Speed 2998.57 samples/sec   Loss 1.5014   LearningRate 0.0021   Epoch: 17   Global Step: 212060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:48:56,289-Speed 3008.45 samples/sec   Loss 1.5075   LearningRate 0.0021   Epoch: 17   Global Step: 212070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:48:59,722-Speed 2984.43 samples/sec   Loss 1.4700   LearningRate 0.0021   Epoch: 17   Global Step: 212080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:49:03,090-Speed 3040.83 samples/sec   Loss 1.4939   LearningRate 0.0021   Epoch: 17   Global Step: 212090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:49:06,498-Speed 3005.52 samples/sec   Loss 1.5269   LearningRate 0.0021   Epoch: 17   Global Step: 212100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:49:09,916-Speed 2997.54 samples/sec   Loss 1.4768   LearningRate 0.0021   Epoch: 17   Global Step: 212110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:49:13,366-Speed 2968.64 samples/sec   Loss 1.4558   LearningRate 0.0021   Epoch: 17   Global Step: 212120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:49:16,702-Speed 3070.67 samples/sec   Loss 1.4537   LearningRate 0.0021   Epoch: 17   Global Step: 212130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:49:20,177-Speed 2947.76 samples/sec   Loss 1.4349   LearningRate 0.0021   Epoch: 17   Global Step: 212140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:49:23,620-Speed 2974.96 samples/sec   Loss 1.4642   LearningRate 0.0021   Epoch: 17   Global Step: 212150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:49:26,975-Speed 3053.06 samples/sec   Loss 1.5006   LearningRate 0.0021   Epoch: 17   Global Step: 212160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:49:30,375-Speed 3011.84 samples/sec   Loss 1.4480   LearningRate 0.0021   Epoch: 17   Global Step: 212170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:49:33,734-Speed 3049.64 samples/sec   Loss 1.5333   LearningRate 0.0021   Epoch: 17   Global Step: 212180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:49:37,055-Speed 3084.59 samples/sec   Loss 1.4575   LearningRate 0.0021   Epoch: 17   Global Step: 212190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:49:40,427-Speed 3036.93 samples/sec   Loss 1.5002   LearningRate 0.0021   Epoch: 17   Global Step: 212200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:49:43,835-Speed 3005.88 samples/sec   Loss 1.4352   LearningRate 0.0021   Epoch: 17   Global Step: 212210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:49:47,265-Speed 2986.70 samples/sec   Loss 1.4275   LearningRate 0.0021   Epoch: 17   Global Step: 212220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:49:50,665-Speed 3012.30 samples/sec   Loss 1.4590   LearningRate 0.0021   Epoch: 17   Global Step: 212230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:49:54,052-Speed 3024.61 samples/sec   Loss 1.4451   LearningRate 0.0021   Epoch: 17   Global Step: 212240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:49:57,459-Speed 3005.69 samples/sec   Loss 1.5011   LearningRate 0.0021   Epoch: 17   Global Step: 212250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:50:00,887-Speed 2988.13 samples/sec   Loss 1.5125   LearningRate 0.0021   Epoch: 17   Global Step: 212260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:50:04,310-Speed 2992.58 samples/sec   Loss 1.4270   LearningRate 0.0021   Epoch: 17   Global Step: 212270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:50:07,806-Speed 2929.71 samples/sec   Loss 1.4941   LearningRate 0.0021   Epoch: 17   Global Step: 212280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:50:11,168-Speed 3046.71 samples/sec   Loss 1.4666   LearningRate 0.0021   Epoch: 17   Global Step: 212290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:50:14,563-Speed 3016.75 samples/sec   Loss 1.5231   LearningRate 0.0021   Epoch: 17   Global Step: 212300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:50:17,922-Speed 3049.60 samples/sec   Loss 1.4714   LearningRate 0.0021   Epoch: 17   Global Step: 212310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:50:21,361-Speed 2978.89 samples/sec   Loss 1.5415   LearningRate 0.0021   Epoch: 17   Global Step: 212320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:50:24,771-Speed 3003.70 samples/sec   Loss 1.5150   LearningRate 0.0021   Epoch: 17   Global Step: 212330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:50:28,103-Speed 3073.58 samples/sec   Loss 1.5055   LearningRate 0.0021   Epoch: 17   Global Step: 212340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:50:31,477-Speed 3036.53 samples/sec   Loss 1.4809   LearningRate 0.0021   Epoch: 17   Global Step: 212350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 21:50:34,856-Speed 3030.87 samples/sec   Loss 1.5158   LearningRate 0.0021   Epoch: 17   Global Step: 212360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 21:50:38,166-Speed 3094.60 samples/sec   Loss 1.4991   LearningRate 0.0021   Epoch: 17   Global Step: 212370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:50:41,495-Speed 3076.92 samples/sec   Loss 1.4581   LearningRate 0.0021   Epoch: 17   Global Step: 212380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:50:44,810-Speed 3090.32 samples/sec   Loss 1.4998   LearningRate 0.0021   Epoch: 17   Global Step: 212390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:50:48,135-Speed 3079.95 samples/sec   Loss 1.4400   LearningRate 0.0021   Epoch: 17   Global Step: 212400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:50:51,474-Speed 3068.23 samples/sec   Loss 1.5176   LearningRate 0.0021   Epoch: 17   Global Step: 212410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:50:54,925-Speed 2967.71 samples/sec   Loss 1.5154   LearningRate 0.0021   Epoch: 17   Global Step: 212420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:50:58,318-Speed 3018.75 samples/sec   Loss 1.4878   LearningRate 0.0021   Epoch: 17   Global Step: 212430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:51:01,694-Speed 3034.05 samples/sec   Loss 1.4988   LearningRate 0.0021   Epoch: 17   Global Step: 212440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:51:05,101-Speed 3007.03 samples/sec   Loss 1.4409   LearningRate 0.0021   Epoch: 17   Global Step: 212450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:51:08,401-Speed 3104.55 samples/sec   Loss 1.5465   LearningRate 0.0021   Epoch: 17   Global Step: 212460   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:51:11,726-Speed 3080.26 samples/sec   Loss 1.4854   LearningRate 0.0021   Epoch: 17   Global Step: 212470   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:51:15,050-Speed 3082.03 samples/sec   Loss 1.4749   LearningRate 0.0021   Epoch: 17   Global Step: 212480   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:51:18,352-Speed 3101.52 samples/sec   Loss 1.4653   LearningRate 0.0021   Epoch: 17   Global Step: 212490   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:51:21,814-Speed 2958.27 samples/sec   Loss 1.4878   LearningRate 0.0021   Epoch: 17   Global Step: 212500   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:51:25,164-Speed 3058.11 samples/sec   Loss 1.5225   LearningRate 0.0021   Epoch: 17   Global Step: 212510   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:51:28,534-Speed 3039.85 samples/sec   Loss 1.4643   LearningRate 0.0021   Epoch: 17   Global Step: 212520   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:51:31,904-Speed 3039.33 samples/sec   Loss 1.4725   LearningRate 0.0021   Epoch: 17   Global Step: 212530   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:51:35,243-Speed 3068.94 samples/sec   Loss 1.4634   LearningRate 0.0021   Epoch: 17   Global Step: 212540   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:51:38,651-Speed 3005.11 samples/sec   Loss 1.4290   LearningRate 0.0021   Epoch: 17   Global Step: 212550   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:51:42,086-Speed 2982.62 samples/sec   Loss 1.4888   LearningRate 0.0021   Epoch: 17   Global Step: 212560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:51:45,465-Speed 3031.09 samples/sec   Loss 1.5369   LearningRate 0.0021   Epoch: 17   Global Step: 212570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:51:48,770-Speed 3098.84 samples/sec   Loss 1.5682   LearningRate 0.0021   Epoch: 17   Global Step: 212580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:51:52,116-Speed 3062.14 samples/sec   Loss 1.4777   LearningRate 0.0021   Epoch: 17   Global Step: 212590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:51:55,541-Speed 2990.35 samples/sec   Loss 1.5523   LearningRate 0.0021   Epoch: 17   Global Step: 212600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:51:58,912-Speed 3038.61 samples/sec   Loss 1.4996   LearningRate 0.0021   Epoch: 17   Global Step: 212610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:52:02,253-Speed 3065.88 samples/sec   Loss 1.4696   LearningRate 0.0021   Epoch: 17   Global Step: 212620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:52:05,573-Speed 3085.13 samples/sec   Loss 1.4878   LearningRate 0.0021   Epoch: 17   Global Step: 212630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:52:08,911-Speed 3068.60 samples/sec   Loss 1.4646   LearningRate 0.0021   Epoch: 17   Global Step: 212640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:52:12,328-Speed 2997.70 samples/sec   Loss 1.4818   LearningRate 0.0021   Epoch: 17   Global Step: 212650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:52:15,708-Speed 3030.89 samples/sec   Loss 1.4609   LearningRate 0.0021   Epoch: 17   Global Step: 212660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:52:19,080-Speed 3037.54 samples/sec   Loss 1.4871   LearningRate 0.0021   Epoch: 17   Global Step: 212670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:52:22,438-Speed 3050.13 samples/sec   Loss 1.4793   LearningRate 0.0021   Epoch: 17   Global Step: 212680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:52:25,753-Speed 3089.55 samples/sec   Loss 1.5168   LearningRate 0.0021   Epoch: 17   Global Step: 212690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:52:29,166-Speed 3001.55 samples/sec   Loss 1.5081   LearningRate 0.0021   Epoch: 17   Global Step: 212700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:52:32,460-Speed 3109.14 samples/sec   Loss 1.4764   LearningRate 0.0021   Epoch: 17   Global Step: 212710   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:52:35,795-Speed 3072.03 samples/sec   Loss 1.4470   LearningRate 0.0021   Epoch: 17   Global Step: 212720   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:52:39,125-Speed 3076.11 samples/sec   Loss 1.5119   LearningRate 0.0021   Epoch: 17   Global Step: 212730   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:52:42,448-Speed 3081.93 samples/sec   Loss 1.5319   LearningRate 0.0021   Epoch: 17   Global Step: 212740   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:52:45,868-Speed 2995.16 samples/sec   Loss 1.5349   LearningRate 0.0021   Epoch: 17   Global Step: 212750   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:52:49,234-Speed 3043.01 samples/sec   Loss 1.4766   LearningRate 0.0021   Epoch: 17   Global Step: 212760   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:52:52,733-Speed 2927.62 samples/sec   Loss 1.5100   LearningRate 0.0021   Epoch: 17   Global Step: 212770   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:52:56,157-Speed 2991.60 samples/sec   Loss 1.5252   LearningRate 0.0021   Epoch: 17   Global Step: 212780   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:52:59,506-Speed 3057.77 samples/sec   Loss 1.4953   LearningRate 0.0021   Epoch: 17   Global Step: 212790   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:53:02,883-Speed 3033.66 samples/sec   Loss 1.5045   LearningRate 0.0021   Epoch: 17   Global Step: 212800   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 21:53:06,307-Speed 2991.74 samples/sec   Loss 1.4617   LearningRate 0.0021   Epoch: 17   Global Step: 212810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:53:09,690-Speed 3027.94 samples/sec   Loss 1.4739   LearningRate 0.0021   Epoch: 17   Global Step: 212820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:53:13,030-Speed 3066.87 samples/sec   Loss 1.5419   LearningRate 0.0021   Epoch: 17   Global Step: 212830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:53:16,364-Speed 3071.71 samples/sec   Loss 1.4787   LearningRate 0.0021   Epoch: 17   Global Step: 212840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:53:19,753-Speed 3022.53 samples/sec   Loss 1.4997   LearningRate 0.0021   Epoch: 17   Global Step: 212850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:53:23,130-Speed 3032.98 samples/sec   Loss 1.4752   LearningRate 0.0020   Epoch: 17   Global Step: 212860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:53:26,609-Speed 2944.14 samples/sec   Loss 1.5173   LearningRate 0.0020   Epoch: 17   Global Step: 212870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:53:30,000-Speed 3020.61 samples/sec   Loss 1.5202   LearningRate 0.0020   Epoch: 17   Global Step: 212880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:53:33,442-Speed 2975.82 samples/sec   Loss 1.4798   LearningRate 0.0020   Epoch: 17   Global Step: 212890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:53:36,922-Speed 2943.35 samples/sec   Loss 1.5517   LearningRate 0.0020   Epoch: 17   Global Step: 212900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:53:40,373-Speed 2968.41 samples/sec   Loss 1.4630   LearningRate 0.0020   Epoch: 17   Global Step: 212910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:53:43,767-Speed 3017.75 samples/sec   Loss 1.5229   LearningRate 0.0020   Epoch: 17   Global Step: 212920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:53:47,204-Speed 2981.21 samples/sec   Loss 1.5084   LearningRate 0.0020   Epoch: 17   Global Step: 212930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:53:50,597-Speed 3018.67 samples/sec   Loss 1.5273   LearningRate 0.0020   Epoch: 17   Global Step: 212940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:53:53,999-Speed 3010.79 samples/sec   Loss 1.5392   LearningRate 0.0020   Epoch: 17   Global Step: 212950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:53:57,394-Speed 3016.88 samples/sec   Loss 1.4854   LearningRate 0.0020   Epoch: 17   Global Step: 212960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:54:00,769-Speed 3035.56 samples/sec   Loss 1.5363   LearningRate 0.0020   Epoch: 17   Global Step: 212970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:54:04,141-Speed 3038.04 samples/sec   Loss 1.4679   LearningRate 0.0020   Epoch: 17   Global Step: 212980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:54:07,515-Speed 3035.09 samples/sec   Loss 1.4843   LearningRate 0.0020   Epoch: 17   Global Step: 212990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:54:10,872-Speed 3051.42 samples/sec   Loss 1.4783   LearningRate 0.0020   Epoch: 17   Global Step: 213000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:54:14,265-Speed 3019.01 samples/sec   Loss 1.4785   LearningRate 0.0020   Epoch: 17   Global Step: 213010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:54:17,602-Speed 3069.16 samples/sec   Loss 1.4886   LearningRate 0.0020   Epoch: 17   Global Step: 213020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:54:21,062-Speed 2960.65 samples/sec   Loss 1.5404   LearningRate 0.0020   Epoch: 17   Global Step: 213030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:54:24,378-Speed 3088.96 samples/sec   Loss 1.4904   LearningRate 0.0020   Epoch: 17   Global Step: 213040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:54:27,725-Speed 3059.95 samples/sec   Loss 1.5415   LearningRate 0.0020   Epoch: 17   Global Step: 213050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:54:31,108-Speed 3028.55 samples/sec   Loss 1.5228   LearningRate 0.0020   Epoch: 17   Global Step: 213060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:54:34,557-Speed 2969.98 samples/sec   Loss 1.5032   LearningRate 0.0020   Epoch: 17   Global Step: 213070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:54:38,127-Speed 2869.20 samples/sec   Loss 1.5513   LearningRate 0.0020   Epoch: 17   Global Step: 213080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:54:41,597-Speed 2952.33 samples/sec   Loss 1.5779   LearningRate 0.0020   Epoch: 17   Global Step: 213090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:54:45,019-Speed 2993.30 samples/sec   Loss 1.4548   LearningRate 0.0020   Epoch: 17   Global Step: 213100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:54:48,372-Speed 3054.47 samples/sec   Loss 1.5211   LearningRate 0.0020   Epoch: 17   Global Step: 213110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:54:51,763-Speed 3021.08 samples/sec   Loss 1.4839   LearningRate 0.0020   Epoch: 17   Global Step: 213120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:54:55,151-Speed 3023.67 samples/sec   Loss 1.5216   LearningRate 0.0020   Epoch: 17   Global Step: 213130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:54:58,489-Speed 3068.13 samples/sec   Loss 1.4664   LearningRate 0.0020   Epoch: 17   Global Step: 213140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:55:01,934-Speed 2974.12 samples/sec   Loss 1.5391   LearningRate 0.0020   Epoch: 17   Global Step: 213150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:55:05,332-Speed 3014.46 samples/sec   Loss 1.5556   LearningRate 0.0020   Epoch: 17   Global Step: 213160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 21:55:08,704-Speed 3037.23 samples/sec   Loss 1.4914   LearningRate 0.0020   Epoch: 17   Global Step: 213170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:55:12,067-Speed 3046.01 samples/sec   Loss 1.5195   LearningRate 0.0020   Epoch: 17   Global Step: 213180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:55:15,488-Speed 2994.47 samples/sec   Loss 1.5109   LearningRate 0.0020   Epoch: 17   Global Step: 213190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:55:18,941-Speed 2966.39 samples/sec   Loss 1.5319   LearningRate 0.0020   Epoch: 17   Global Step: 213200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:55:22,314-Speed 3036.18 samples/sec   Loss 1.5869   LearningRate 0.0020   Epoch: 17   Global Step: 213210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:55:25,734-Speed 2995.24 samples/sec   Loss 1.5645   LearningRate 0.0020   Epoch: 17   Global Step: 213220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:55:29,198-Speed 2957.01 samples/sec   Loss 1.5103   LearningRate 0.0020   Epoch: 17   Global Step: 213230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:55:32,654-Speed 2963.25 samples/sec   Loss 1.5551   LearningRate 0.0020   Epoch: 17   Global Step: 213240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:55:36,030-Speed 3033.76 samples/sec   Loss 1.4931   LearningRate 0.0020   Epoch: 17   Global Step: 213250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:55:39,438-Speed 3006.34 samples/sec   Loss 1.5154   LearningRate 0.0020   Epoch: 17   Global Step: 213260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:55:42,881-Speed 2974.73 samples/sec   Loss 1.4590   LearningRate 0.0020   Epoch: 17   Global Step: 213270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:55:46,268-Speed 3024.17 samples/sec   Loss 1.4810   LearningRate 0.0020   Epoch: 17   Global Step: 213280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:55:49,720-Speed 2966.85 samples/sec   Loss 1.5166   LearningRate 0.0020   Epoch: 17   Global Step: 213290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:55:53,080-Speed 3048.89 samples/sec   Loss 1.5183   LearningRate 0.0020   Epoch: 17   Global Step: 213300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:55:56,494-Speed 2999.55 samples/sec   Loss 1.5072   LearningRate 0.0020   Epoch: 17   Global Step: 213310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:55:59,872-Speed 3032.02 samples/sec   Loss 1.4911   LearningRate 0.0020   Epoch: 17   Global Step: 213320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:03,243-Speed 3044.00 samples/sec   Loss 1.5708   LearningRate 0.0020   Epoch: 17   Global Step: 213330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:06,627-Speed 3026.12 samples/sec   Loss 1.4920   LearningRate 0.0020   Epoch: 17   Global Step: 213340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:10,085-Speed 2962.55 samples/sec   Loss 1.5183   LearningRate 0.0020   Epoch: 17   Global Step: 213350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:13,414-Speed 3076.10 samples/sec   Loss 1.5356   LearningRate 0.0020   Epoch: 17   Global Step: 213360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:16,730-Speed 3088.69 samples/sec   Loss 1.4869   LearningRate 0.0020   Epoch: 17   Global Step: 213370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:20,210-Speed 2944.57 samples/sec   Loss 1.4801   LearningRate 0.0020   Epoch: 17   Global Step: 213380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:23,615-Speed 3007.51 samples/sec   Loss 1.5364   LearningRate 0.0020   Epoch: 17   Global Step: 213390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:27,038-Speed 2991.95 samples/sec   Loss 1.5679   LearningRate 0.0020   Epoch: 17   Global Step: 213400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:30,396-Speed 3050.33 samples/sec   Loss 1.4618   LearningRate 0.0020   Epoch: 17   Global Step: 213410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:33,790-Speed 3019.17 samples/sec   Loss 1.5374   LearningRate 0.0020   Epoch: 17   Global Step: 213420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:37,165-Speed 3034.09 samples/sec   Loss 1.4998   LearningRate 0.0020   Epoch: 17   Global Step: 213430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:40,572-Speed 3006.77 samples/sec   Loss 1.4937   LearningRate 0.0020   Epoch: 17   Global Step: 213440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:43,883-Speed 3094.01 samples/sec   Loss 1.5271   LearningRate 0.0020   Epoch: 17   Global Step: 213450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:47,210-Speed 3078.08 samples/sec   Loss 1.5214   LearningRate 0.0020   Epoch: 17   Global Step: 213460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:50,543-Speed 3073.20 samples/sec   Loss 1.4684   LearningRate 0.0020   Epoch: 17   Global Step: 213470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:53,906-Speed 3046.08 samples/sec   Loss 1.4925   LearningRate 0.0020   Epoch: 17   Global Step: 213480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:56:57,386-Speed 2943.35 samples/sec   Loss 1.5551   LearningRate 0.0020   Epoch: 17   Global Step: 213490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:57:00,878-Speed 2933.15 samples/sec   Loss 1.4772   LearningRate 0.0020   Epoch: 17   Global Step: 213500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:57:04,280-Speed 3010.59 samples/sec   Loss 1.5432   LearningRate 0.0020   Epoch: 17   Global Step: 213510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:57:07,690-Speed 3004.04 samples/sec   Loss 1.5526   LearningRate 0.0020   Epoch: 17   Global Step: 213520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:57:11,027-Speed 3069.29 samples/sec   Loss 1.4798   LearningRate 0.0020   Epoch: 17   Global Step: 213530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:57:14,446-Speed 2995.55 samples/sec   Loss 1.5624   LearningRate 0.0020   Epoch: 17   Global Step: 213540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:57:17,893-Speed 2972.31 samples/sec   Loss 1.5667   LearningRate 0.0020   Epoch: 17   Global Step: 213550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:57:21,322-Speed 2987.07 samples/sec   Loss 1.5902   LearningRate 0.0020   Epoch: 17   Global Step: 213560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:57:24,786-Speed 2956.44 samples/sec   Loss 1.4984   LearningRate 0.0020   Epoch: 17   Global Step: 213570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:57:28,107-Speed 3084.72 samples/sec   Loss 1.4962   LearningRate 0.0020   Epoch: 17   Global Step: 213580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:57:31,430-Speed 3082.11 samples/sec   Loss 1.5283   LearningRate 0.0020   Epoch: 17   Global Step: 213590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:57:34,842-Speed 3001.60 samples/sec   Loss 1.5612   LearningRate 0.0020   Epoch: 17   Global Step: 213600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:57:38,275-Speed 2983.71 samples/sec   Loss 1.5425   LearningRate 0.0020   Epoch: 17   Global Step: 213610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:57:41,648-Speed 3037.72 samples/sec   Loss 1.5837   LearningRate 0.0020   Epoch: 17   Global Step: 213620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:57:45,035-Speed 3023.87 samples/sec   Loss 1.5040   LearningRate 0.0020   Epoch: 17   Global Step: 213630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:57:48,378-Speed 3064.56 samples/sec   Loss 1.5008   LearningRate 0.0020   Epoch: 17   Global Step: 213640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:57:51,766-Speed 3022.61 samples/sec   Loss 1.5357   LearningRate 0.0020   Epoch: 17   Global Step: 213650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:57:55,127-Speed 3047.54 samples/sec   Loss 1.5475   LearningRate 0.0020   Epoch: 17   Global Step: 213660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:57:58,483-Speed 3052.80 samples/sec   Loss 1.5770   LearningRate 0.0020   Epoch: 17   Global Step: 213670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:02,001-Speed 2911.65 samples/sec   Loss 1.5719   LearningRate 0.0020   Epoch: 17   Global Step: 213680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:05,393-Speed 3019.59 samples/sec   Loss 1.5666   LearningRate 0.0020   Epoch: 17   Global Step: 213690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:08,771-Speed 3032.34 samples/sec   Loss 1.5313   LearningRate 0.0020   Epoch: 17   Global Step: 213700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:12,129-Speed 3050.65 samples/sec   Loss 1.4748   LearningRate 0.0020   Epoch: 17   Global Step: 213710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:15,529-Speed 3012.69 samples/sec   Loss 1.5425   LearningRate 0.0020   Epoch: 17   Global Step: 213720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:18,936-Speed 3006.24 samples/sec   Loss 1.5535   LearningRate 0.0020   Epoch: 17   Global Step: 213730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:22,367-Speed 2985.44 samples/sec   Loss 1.5480   LearningRate 0.0019   Epoch: 17   Global Step: 213740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:25,749-Speed 3028.84 samples/sec   Loss 1.5021   LearningRate 0.0019   Epoch: 17   Global Step: 213750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:29,086-Speed 3068.84 samples/sec   Loss 1.5274   LearningRate 0.0019   Epoch: 17   Global Step: 213760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 21:58:32,472-Speed 3025.79 samples/sec   Loss 1.5762   LearningRate 0.0019   Epoch: 17   Global Step: 213770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:35,950-Speed 2944.68 samples/sec   Loss 1.5217   LearningRate 0.0019   Epoch: 17   Global Step: 213780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:39,271-Speed 3084.01 samples/sec   Loss 1.5184   LearningRate 0.0019   Epoch: 17   Global Step: 213790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:42,612-Speed 3066.52 samples/sec   Loss 1.4598   LearningRate 0.0019   Epoch: 17   Global Step: 213800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:45,931-Speed 3085.22 samples/sec   Loss 1.5122   LearningRate 0.0019   Epoch: 17   Global Step: 213810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:49,301-Speed 3040.01 samples/sec   Loss 1.5862   LearningRate 0.0019   Epoch: 17   Global Step: 213820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:52,667-Speed 3043.26 samples/sec   Loss 1.5213   LearningRate 0.0019   Epoch: 17   Global Step: 213830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:56,102-Speed 2981.62 samples/sec   Loss 1.5740   LearningRate 0.0019   Epoch: 17   Global Step: 213840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:58:59,543-Speed 2976.92 samples/sec   Loss 1.5894   LearningRate 0.0019   Epoch: 17   Global Step: 213850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:59:02,989-Speed 2972.76 samples/sec   Loss 1.5796   LearningRate 0.0019   Epoch: 17   Global Step: 213860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:59:06,355-Speed 3042.68 samples/sec   Loss 1.5011   LearningRate 0.0019   Epoch: 17   Global Step: 213870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 21:59:09,740-Speed 3025.69 samples/sec   Loss 1.5236   LearningRate 0.0019   Epoch: 17   Global Step: 213880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 21:59:13,079-Speed 3067.65 samples/sec   Loss 1.5204   LearningRate 0.0019   Epoch: 17   Global Step: 213890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:59:16,407-Speed 3077.42 samples/sec   Loss 1.5962   LearningRate 0.0019   Epoch: 17   Global Step: 213900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:59:19,775-Speed 3041.50 samples/sec   Loss 1.5563   LearningRate 0.0019   Epoch: 17   Global Step: 213910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:59:23,140-Speed 3043.32 samples/sec   Loss 1.6017   LearningRate 0.0019   Epoch: 17   Global Step: 213920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:59:26,481-Speed 3066.39 samples/sec   Loss 1.5179   LearningRate 0.0019   Epoch: 17   Global Step: 213930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:59:29,873-Speed 3019.90 samples/sec   Loss 1.5415   LearningRate 0.0019   Epoch: 17   Global Step: 213940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:59:33,279-Speed 3007.04 samples/sec   Loss 1.5315   LearningRate 0.0019   Epoch: 17   Global Step: 213950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 21:59:36,619-Speed 3066.92 samples/sec   Loss 1.5627   LearningRate 0.0019   Epoch: 17   Global Step: 213960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:59:39,998-Speed 3030.92 samples/sec   Loss 1.5310   LearningRate 0.0019   Epoch: 17   Global Step: 213970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:59:43,380-Speed 3029.33 samples/sec   Loss 1.5770   LearningRate 0.0019   Epoch: 17   Global Step: 213980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:59:46,821-Speed 2976.47 samples/sec   Loss 1.5254   LearningRate 0.0019   Epoch: 17   Global Step: 213990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:59:50,252-Speed 2985.36 samples/sec   Loss 1.5838   LearningRate 0.0019   Epoch: 17   Global Step: 214000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:59:53,636-Speed 3026.72 samples/sec   Loss 1.5254   LearningRate 0.0019   Epoch: 17   Global Step: 214010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 21:59:56,964-Speed 3078.11 samples/sec   Loss 1.5493   LearningRate 0.0019   Epoch: 17   Global Step: 214020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:00:00,297-Speed 3073.49 samples/sec   Loss 1.5485   LearningRate 0.0019   Epoch: 17   Global Step: 214030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:00:03,700-Speed 3009.82 samples/sec   Loss 1.5323   LearningRate 0.0019   Epoch: 17   Global Step: 214040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:00:07,063-Speed 3045.81 samples/sec   Loss 1.5361   LearningRate 0.0019   Epoch: 17   Global Step: 214050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:00:10,515-Speed 2967.00 samples/sec   Loss 1.4974   LearningRate 0.0019   Epoch: 17   Global Step: 214060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:00:13,863-Speed 3060.75 samples/sec   Loss 1.5197   LearningRate 0.0019   Epoch: 17   Global Step: 214070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:00:17,288-Speed 2990.84 samples/sec   Loss 1.5451   LearningRate 0.0019   Epoch: 17   Global Step: 214080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:00:20,687-Speed 3013.15 samples/sec   Loss 1.4919   LearningRate 0.0019   Epoch: 17   Global Step: 214090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:00:24,025-Speed 3069.09 samples/sec   Loss 1.5445   LearningRate 0.0019   Epoch: 17   Global Step: 214100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:00:27,404-Speed 3031.01 samples/sec   Loss 1.5803   LearningRate 0.0019   Epoch: 17   Global Step: 214110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:00:30,841-Speed 2980.56 samples/sec   Loss 1.5519   LearningRate 0.0019   Epoch: 17   Global Step: 214120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:00:34,185-Speed 3063.32 samples/sec   Loss 1.5654   LearningRate 0.0019   Epoch: 17   Global Step: 214130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:00:37,592-Speed 3006.27 samples/sec   Loss 1.5553   LearningRate 0.0019   Epoch: 17   Global Step: 214140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:00:40,970-Speed 3031.82 samples/sec   Loss 1.5291   LearningRate 0.0019   Epoch: 17   Global Step: 214150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:00:44,466-Speed 2929.95 samples/sec   Loss 1.4982   LearningRate 0.0019   Epoch: 17   Global Step: 214160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:00:47,803-Speed 3070.07 samples/sec   Loss 1.6213   LearningRate 0.0019   Epoch: 17   Global Step: 214170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:00:51,206-Speed 3009.72 samples/sec   Loss 1.5029   LearningRate 0.0019   Epoch: 17   Global Step: 214180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:00:54,586-Speed 3030.96 samples/sec   Loss 1.5370   LearningRate 0.0019   Epoch: 17   Global Step: 214190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:00:57,963-Speed 3033.01 samples/sec   Loss 1.5259   LearningRate 0.0019   Epoch: 17   Global Step: 214200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:01:01,300-Speed 3069.27 samples/sec   Loss 1.5453   LearningRate 0.0019   Epoch: 17   Global Step: 214210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:01:04,725-Speed 2991.02 samples/sec   Loss 1.5381   LearningRate 0.0019   Epoch: 17   Global Step: 214220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:01:08,270-Speed 2888.83 samples/sec   Loss 1.5545   LearningRate 0.0019   Epoch: 17   Global Step: 214230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:01:11,715-Speed 2974.28 samples/sec   Loss 1.5413   LearningRate 0.0019   Epoch: 17   Global Step: 214240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:01:15,065-Speed 3056.88 samples/sec   Loss 1.5301   LearningRate 0.0019   Epoch: 17   Global Step: 214250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:01:18,508-Speed 2975.61 samples/sec   Loss 1.5322   LearningRate 0.0019   Epoch: 17   Global Step: 214260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:01:21,937-Speed 2986.55 samples/sec   Loss 1.5673   LearningRate 0.0019   Epoch: 17   Global Step: 214270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:01:25,348-Speed 3003.29 samples/sec   Loss 1.5407   LearningRate 0.0019   Epoch: 17   Global Step: 214280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:01:28,714-Speed 3043.39 samples/sec   Loss 1.5964   LearningRate 0.0019   Epoch: 17   Global Step: 214290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:01:32,088-Speed 3035.68 samples/sec   Loss 1.5744   LearningRate 0.0019   Epoch: 17   Global Step: 214300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:01:35,524-Speed 2980.93 samples/sec   Loss 1.5600   LearningRate 0.0019   Epoch: 17   Global Step: 214310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:01:39,027-Speed 2924.98 samples/sec   Loss 1.5019   LearningRate 0.0019   Epoch: 17   Global Step: 214320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:01:42,481-Speed 2965.91 samples/sec   Loss 1.5305   LearningRate 0.0019   Epoch: 17   Global Step: 214330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:01:45,918-Speed 2980.18 samples/sec   Loss 1.5018   LearningRate 0.0019   Epoch: 17   Global Step: 214340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:01:49,336-Speed 2996.65 samples/sec   Loss 1.6165   LearningRate 0.0019   Epoch: 17   Global Step: 214350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:01:52,767-Speed 2985.84 samples/sec   Loss 1.4976   LearningRate 0.0019   Epoch: 17   Global Step: 214360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:01:56,229-Speed 2958.88 samples/sec   Loss 1.5021   LearningRate 0.0019   Epoch: 17   Global Step: 214370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:01:59,684-Speed 2964.87 samples/sec   Loss 1.5758   LearningRate 0.0019   Epoch: 17   Global Step: 214380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:02:03,073-Speed 3022.19 samples/sec   Loss 1.5578   LearningRate 0.0019   Epoch: 17   Global Step: 214390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:02:06,478-Speed 3008.18 samples/sec   Loss 1.5600   LearningRate 0.0019   Epoch: 17   Global Step: 214400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:02:09,851-Speed 3036.83 samples/sec   Loss 1.5933   LearningRate 0.0019   Epoch: 17   Global Step: 214410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:02:13,318-Speed 2954.68 samples/sec   Loss 1.6079   LearningRate 0.0019   Epoch: 17   Global Step: 214420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:02:16,730-Speed 3001.58 samples/sec   Loss 1.5964   LearningRate 0.0019   Epoch: 17   Global Step: 214430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:02:20,088-Speed 3050.50 samples/sec   Loss 1.5734   LearningRate 0.0019   Epoch: 17   Global Step: 214440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:02:23,489-Speed 3011.88 samples/sec   Loss 1.4973   LearningRate 0.0019   Epoch: 17   Global Step: 214450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:02:26,807-Speed 3086.95 samples/sec   Loss 1.6253   LearningRate 0.0019   Epoch: 17   Global Step: 214460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:02:30,213-Speed 3007.43 samples/sec   Loss 1.5964   LearningRate 0.0019   Epoch: 17   Global Step: 214470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:02:33,536-Speed 3081.80 samples/sec   Loss 1.6045   LearningRate 0.0019   Epoch: 17   Global Step: 214480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:02:36,893-Speed 3051.88 samples/sec   Loss 1.5464   LearningRate 0.0019   Epoch: 17   Global Step: 214490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:02:40,315-Speed 2992.71 samples/sec   Loss 1.5826   LearningRate 0.0019   Epoch: 17   Global Step: 214500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:02:43,638-Speed 3083.31 samples/sec   Loss 1.4854   LearningRate 0.0019   Epoch: 17   Global Step: 214510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:02:47,057-Speed 2995.05 samples/sec   Loss 1.5245   LearningRate 0.0019   Epoch: 17   Global Step: 214520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:02:50,413-Speed 3052.98 samples/sec   Loss 1.5752   LearningRate 0.0019   Epoch: 17   Global Step: 214530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:02:53,793-Speed 3030.27 samples/sec   Loss 1.6013   LearningRate 0.0019   Epoch: 17   Global Step: 214540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:02:57,155-Speed 3046.21 samples/sec   Loss 1.5624   LearningRate 0.0019   Epoch: 17   Global Step: 214550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:03:00,534-Speed 3032.30 samples/sec   Loss 1.5354   LearningRate 0.0019   Epoch: 17   Global Step: 214560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:03:03,912-Speed 3031.82 samples/sec   Loss 1.6318   LearningRate 0.0019   Epoch: 17   Global Step: 214570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:03:07,304-Speed 3019.99 samples/sec   Loss 1.5822   LearningRate 0.0019   Epoch: 17   Global Step: 214580   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:03:10,725-Speed 2993.94 samples/sec   Loss 1.5365   LearningRate 0.0019   Epoch: 17   Global Step: 214590   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:03:14,069-Speed 3062.68 samples/sec   Loss 1.5827   LearningRate 0.0019   Epoch: 17   Global Step: 214600   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:03:17,401-Speed 3074.59 samples/sec   Loss 1.5818   LearningRate 0.0019   Epoch: 17   Global Step: 214610   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:03:20,838-Speed 2980.23 samples/sec   Loss 1.5579   LearningRate 0.0019   Epoch: 17   Global Step: 214620   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:03:24,278-Speed 2977.68 samples/sec   Loss 1.5353   LearningRate 0.0019   Epoch: 17   Global Step: 214630   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:03:27,698-Speed 2994.66 samples/sec   Loss 1.5760   LearningRate 0.0018   Epoch: 17   Global Step: 214640   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:03:31,066-Speed 3041.38 samples/sec   Loss 1.5600   LearningRate 0.0018   Epoch: 17   Global Step: 214650   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:03:34,463-Speed 3015.37 samples/sec   Loss 1.5456   LearningRate 0.0018   Epoch: 17   Global Step: 214660   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:03:37,859-Speed 3016.49 samples/sec   Loss 1.5534   LearningRate 0.0018   Epoch: 17   Global Step: 214670   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:03:41,214-Speed 3053.61 samples/sec   Loss 1.5196   LearningRate 0.0018   Epoch: 17   Global Step: 214680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:03:44,625-Speed 3002.49 samples/sec   Loss 1.5958   LearningRate 0.0018   Epoch: 17   Global Step: 214690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:03:48,018-Speed 3019.17 samples/sec   Loss 1.5255   LearningRate 0.0018   Epoch: 17   Global Step: 214700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:03:51,415-Speed 3015.32 samples/sec   Loss 1.5757   LearningRate 0.0018   Epoch: 17   Global Step: 214710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:03:54,832-Speed 2997.15 samples/sec   Loss 1.5971   LearningRate 0.0018   Epoch: 17   Global Step: 214720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:03:58,180-Speed 3059.59 samples/sec   Loss 1.5790   LearningRate 0.0018   Epoch: 17   Global Step: 214730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:01,568-Speed 3023.52 samples/sec   Loss 1.5409   LearningRate 0.0018   Epoch: 17   Global Step: 214740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:04,969-Speed 3011.79 samples/sec   Loss 1.5441   LearningRate 0.0018   Epoch: 17   Global Step: 214750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:08,357-Speed 3022.79 samples/sec   Loss 1.5295   LearningRate 0.0018   Epoch: 17   Global Step: 214760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:11,731-Speed 3035.94 samples/sec   Loss 1.5497   LearningRate 0.0018   Epoch: 17   Global Step: 214770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:15,074-Speed 3064.60 samples/sec   Loss 1.5905   LearningRate 0.0018   Epoch: 17   Global Step: 214780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:04:18,491-Speed 2997.06 samples/sec   Loss 1.5920   LearningRate 0.0018   Epoch: 17   Global Step: 214790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:04:21,877-Speed 3025.80 samples/sec   Loss 1.5908   LearningRate 0.0018   Epoch: 17   Global Step: 214800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:04:25,249-Speed 3037.00 samples/sec   Loss 1.5499   LearningRate 0.0018   Epoch: 17   Global Step: 214810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:28,684-Speed 2982.60 samples/sec   Loss 1.5334   LearningRate 0.0018   Epoch: 17   Global Step: 214820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:32,184-Speed 2926.64 samples/sec   Loss 1.5873   LearningRate 0.0018   Epoch: 17   Global Step: 214830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:35,511-Speed 3078.13 samples/sec   Loss 1.5669   LearningRate 0.0018   Epoch: 17   Global Step: 214840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:38,925-Speed 2999.87 samples/sec   Loss 1.5920   LearningRate 0.0018   Epoch: 17   Global Step: 214850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:42,345-Speed 2995.88 samples/sec   Loss 1.5402   LearningRate 0.0018   Epoch: 17   Global Step: 214860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:45,819-Speed 2948.16 samples/sec   Loss 1.5570   LearningRate 0.0018   Epoch: 17   Global Step: 214870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:49,291-Speed 2949.92 samples/sec   Loss 1.5856   LearningRate 0.0018   Epoch: 17   Global Step: 214880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:52,698-Speed 3006.38 samples/sec   Loss 1.6055   LearningRate 0.0018   Epoch: 17   Global Step: 214890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:56,114-Speed 2998.75 samples/sec   Loss 1.5465   LearningRate 0.0018   Epoch: 17   Global Step: 214900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:04:59,513-Speed 3014.13 samples/sec   Loss 1.5652   LearningRate 0.0018   Epoch: 17   Global Step: 214910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:05:02,883-Speed 3039.25 samples/sec   Loss 1.5849   LearningRate 0.0018   Epoch: 17   Global Step: 214920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:05:06,204-Speed 3083.55 samples/sec   Loss 1.5966   LearningRate 0.0018   Epoch: 17   Global Step: 214930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:05:09,574-Speed 3038.92 samples/sec   Loss 1.5188   LearningRate 0.0018   Epoch: 17   Global Step: 214940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:05:13,075-Speed 2926.47 samples/sec   Loss 1.5460   LearningRate 0.0018   Epoch: 17   Global Step: 214950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:05:16,407-Speed 3073.89 samples/sec   Loss 1.5719   LearningRate 0.0018   Epoch: 17   Global Step: 214960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:05:19,796-Speed 3022.24 samples/sec   Loss 1.5493   LearningRate 0.0018   Epoch: 17   Global Step: 214970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:05:23,126-Speed 3075.89 samples/sec   Loss 1.5827   LearningRate 0.0018   Epoch: 17   Global Step: 214980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:05:26,569-Speed 2974.71 samples/sec   Loss 1.6277   LearningRate 0.0018   Epoch: 17   Global Step: 214990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:05:29,965-Speed 3017.11 samples/sec   Loss 1.5452   LearningRate 0.0018   Epoch: 17   Global Step: 215000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:05:33,436-Speed 2950.94 samples/sec   Loss 1.5934   LearningRate 0.0018   Epoch: 17   Global Step: 215010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:05:36,830-Speed 3018.06 samples/sec   Loss 1.5605   LearningRate 0.0018   Epoch: 17   Global Step: 215020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:05:40,184-Speed 3053.55 samples/sec   Loss 1.6513   LearningRate 0.0018   Epoch: 17   Global Step: 215030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:05:43,521-Speed 3069.80 samples/sec   Loss 1.5698   LearningRate 0.0018   Epoch: 17   Global Step: 215040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:05:46,842-Speed 3083.84 samples/sec   Loss 1.5849   LearningRate 0.0018   Epoch: 17   Global Step: 215050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:05:50,204-Speed 3046.60 samples/sec   Loss 1.5911   LearningRate 0.0018   Epoch: 17   Global Step: 215060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:05:53,551-Speed 3061.18 samples/sec   Loss 1.6195   LearningRate 0.0018   Epoch: 17   Global Step: 215070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:05:56,893-Speed 3064.33 samples/sec   Loss 1.5877   LearningRate 0.0018   Epoch: 17   Global Step: 215080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:06:00,283-Speed 3022.08 samples/sec   Loss 1.5752   LearningRate 0.0018   Epoch: 17   Global Step: 215090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:06:03,686-Speed 3010.33 samples/sec   Loss 1.5743   LearningRate 0.0018   Epoch: 17   Global Step: 215100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:06:07,045-Speed 3048.67 samples/sec   Loss 1.5475   LearningRate 0.0018   Epoch: 17   Global Step: 215110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:06:10,431-Speed 3025.83 samples/sec   Loss 1.5551   LearningRate 0.0018   Epoch: 17   Global Step: 215120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:06:13,846-Speed 2998.89 samples/sec   Loss 1.5527   LearningRate 0.0018   Epoch: 17   Global Step: 215130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:06:17,198-Speed 3055.48 samples/sec   Loss 1.5608   LearningRate 0.0018   Epoch: 17   Global Step: 215140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:06:20,591-Speed 3019.69 samples/sec   Loss 1.5830   LearningRate 0.0018   Epoch: 17   Global Step: 215150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:06:23,968-Speed 3032.99 samples/sec   Loss 1.5652   LearningRate 0.0018   Epoch: 17   Global Step: 215160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:06:27,333-Speed 3043.99 samples/sec   Loss 1.5599   LearningRate 0.0018   Epoch: 17   Global Step: 215170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:06:30,689-Speed 3052.13 samples/sec   Loss 1.6053   LearningRate 0.0018   Epoch: 17   Global Step: 215180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:06:34,085-Speed 3016.26 samples/sec   Loss 1.5722   LearningRate 0.0018   Epoch: 17   Global Step: 215190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:06:37,437-Speed 3056.54 samples/sec   Loss 1.6236   LearningRate 0.0018   Epoch: 17   Global Step: 215200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:06:40,866-Speed 2986.92 samples/sec   Loss 1.5815   LearningRate 0.0018   Epoch: 17   Global Step: 215210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:06:44,215-Speed 3058.24 samples/sec   Loss 1.6019   LearningRate 0.0018   Epoch: 17   Global Step: 215220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:06:47,560-Speed 3062.07 samples/sec   Loss 1.5873   LearningRate 0.0018   Epoch: 17   Global Step: 215230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:06:50,910-Speed 3057.62 samples/sec   Loss 1.5624   LearningRate 0.0018   Epoch: 17   Global Step: 215240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:06:54,307-Speed 3015.78 samples/sec   Loss 1.5866   LearningRate 0.0018   Epoch: 17   Global Step: 215250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:06:57,705-Speed 3014.70 samples/sec   Loss 1.5784   LearningRate 0.0018   Epoch: 17   Global Step: 215260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:07:01,107-Speed 3010.75 samples/sec   Loss 1.5600   LearningRate 0.0018   Epoch: 17   Global Step: 215270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:04,504-Speed 3015.77 samples/sec   Loss 1.6165   LearningRate 0.0018   Epoch: 17   Global Step: 215280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:07,889-Speed 3025.21 samples/sec   Loss 1.5796   LearningRate 0.0018   Epoch: 17   Global Step: 215290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:11,237-Speed 3059.62 samples/sec   Loss 1.6462   LearningRate 0.0018   Epoch: 17   Global Step: 215300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:14,618-Speed 3029.75 samples/sec   Loss 1.5630   LearningRate 0.0018   Epoch: 17   Global Step: 215310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:17,983-Speed 3043.66 samples/sec   Loss 1.5724   LearningRate 0.0018   Epoch: 17   Global Step: 215320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:21,373-Speed 3021.64 samples/sec   Loss 1.5361   LearningRate 0.0018   Epoch: 17   Global Step: 215330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:24,828-Speed 2964.39 samples/sec   Loss 1.5627   LearningRate 0.0018   Epoch: 17   Global Step: 215340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:28,246-Speed 2996.89 samples/sec   Loss 1.5771   LearningRate 0.0018   Epoch: 17   Global Step: 215350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:31,631-Speed 3026.31 samples/sec   Loss 1.6320   LearningRate 0.0018   Epoch: 17   Global Step: 215360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:35,037-Speed 3007.53 samples/sec   Loss 1.5662   LearningRate 0.0018   Epoch: 17   Global Step: 215370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:07:38,478-Speed 2977.19 samples/sec   Loss 1.5940   LearningRate 0.0018   Epoch: 17   Global Step: 215380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:07:41,820-Speed 3064.75 samples/sec   Loss 1.5655   LearningRate 0.0018   Epoch: 17   Global Step: 215390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:45,192-Speed 3036.70 samples/sec   Loss 1.6267   LearningRate 0.0018   Epoch: 17   Global Step: 215400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:48,524-Speed 3074.75 samples/sec   Loss 1.6303   LearningRate 0.0018   Epoch: 17   Global Step: 215410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:51,905-Speed 3029.87 samples/sec   Loss 1.5818   LearningRate 0.0018   Epoch: 17   Global Step: 215420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:55,232-Speed 3077.88 samples/sec   Loss 1.5974   LearningRate 0.0018   Epoch: 17   Global Step: 215430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:07:58,597-Speed 3043.90 samples/sec   Loss 1.6178   LearningRate 0.0018   Epoch: 17   Global Step: 215440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:01,945-Speed 3059.63 samples/sec   Loss 1.5480   LearningRate 0.0018   Epoch: 17   Global Step: 215450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:05,319-Speed 3035.85 samples/sec   Loss 1.5654   LearningRate 0.0018   Epoch: 17   Global Step: 215460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:08,709-Speed 3021.83 samples/sec   Loss 1.5172   LearningRate 0.0018   Epoch: 17   Global Step: 215470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:12,063-Speed 3053.93 samples/sec   Loss 1.5595   LearningRate 0.0018   Epoch: 17   Global Step: 215480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:15,389-Speed 3079.87 samples/sec   Loss 1.6328   LearningRate 0.0018   Epoch: 17   Global Step: 215490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:18,808-Speed 2995.44 samples/sec   Loss 1.5624   LearningRate 0.0018   Epoch: 17   Global Step: 215500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:22,194-Speed 3025.53 samples/sec   Loss 1.6307   LearningRate 0.0018   Epoch: 17   Global Step: 215510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:25,562-Speed 3041.49 samples/sec   Loss 1.6286   LearningRate 0.0018   Epoch: 17   Global Step: 215520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:28,944-Speed 3029.14 samples/sec   Loss 1.6128   LearningRate 0.0018   Epoch: 17   Global Step: 215530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:32,324-Speed 3030.07 samples/sec   Loss 1.5438   LearningRate 0.0018   Epoch: 17   Global Step: 215540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:35,721-Speed 3015.67 samples/sec   Loss 1.6331   LearningRate 0.0018   Epoch: 17   Global Step: 215550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:39,140-Speed 2996.28 samples/sec   Loss 1.6108   LearningRate 0.0017   Epoch: 17   Global Step: 215560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:42,518-Speed 3032.04 samples/sec   Loss 1.6102   LearningRate 0.0017   Epoch: 17   Global Step: 215570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:45,894-Speed 3034.44 samples/sec   Loss 1.5919   LearningRate 0.0017   Epoch: 17   Global Step: 215580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:49,313-Speed 2995.90 samples/sec   Loss 1.5541   LearningRate 0.0017   Epoch: 17   Global Step: 215590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:08:52,622-Speed 3095.19 samples/sec   Loss 1.6092   LearningRate 0.0017   Epoch: 17   Global Step: 215600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:55,964-Speed 3064.82 samples/sec   Loss 1.5904   LearningRate 0.0017   Epoch: 17   Global Step: 215610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:08:59,347-Speed 3027.42 samples/sec   Loss 1.5930   LearningRate 0.0017   Epoch: 17   Global Step: 215620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:09:02,817-Speed 2952.45 samples/sec   Loss 1.5493   LearningRate 0.0017   Epoch: 17   Global Step: 215630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:09:06,164-Speed 3059.92 samples/sec   Loss 1.6542   LearningRate 0.0017   Epoch: 17   Global Step: 215640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:09:09,540-Speed 3033.78 samples/sec   Loss 1.6150   LearningRate 0.0017   Epoch: 17   Global Step: 215650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:09:12,927-Speed 3024.43 samples/sec   Loss 1.5989   LearningRate 0.0017   Epoch: 17   Global Step: 215660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:09:16,377-Speed 2969.02 samples/sec   Loss 1.6362   LearningRate 0.0017   Epoch: 17   Global Step: 215670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:09:19,758-Speed 3029.69 samples/sec   Loss 1.5957   LearningRate 0.0017   Epoch: 17   Global Step: 215680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:09:23,103-Speed 3062.11 samples/sec   Loss 1.5252   LearningRate 0.0017   Epoch: 17   Global Step: 215690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:09:26,560-Speed 2962.57 samples/sec   Loss 1.6035   LearningRate 0.0017   Epoch: 17   Global Step: 215700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:09:29,962-Speed 3011.71 samples/sec   Loss 1.6295   LearningRate 0.0017   Epoch: 17   Global Step: 215710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:09:33,370-Speed 3005.04 samples/sec   Loss 1.6366   LearningRate 0.0017   Epoch: 17   Global Step: 215720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:09:36,789-Speed 2996.51 samples/sec   Loss 1.5946   LearningRate 0.0017   Epoch: 17   Global Step: 215730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:09:40,201-Speed 3001.60 samples/sec   Loss 1.6363   LearningRate 0.0017   Epoch: 17   Global Step: 215740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:09:43,558-Speed 3051.81 samples/sec   Loss 1.6439   LearningRate 0.0017   Epoch: 17   Global Step: 215750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:09:46,969-Speed 3002.43 samples/sec   Loss 1.5924   LearningRate 0.0017   Epoch: 17   Global Step: 215760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:09:50,413-Speed 2974.16 samples/sec   Loss 1.5381   LearningRate 0.0017   Epoch: 17   Global Step: 215770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:09:53,838-Speed 2991.37 samples/sec   Loss 1.5454   LearningRate 0.0017   Epoch: 17   Global Step: 215780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:09:57,211-Speed 3035.87 samples/sec   Loss 1.6042   LearningRate 0.0017   Epoch: 17   Global Step: 215790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:10:00,610-Speed 3014.20 samples/sec   Loss 1.6143   LearningRate 0.0017   Epoch: 17   Global Step: 215800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:10:04,017-Speed 3005.86 samples/sec   Loss 1.6495   LearningRate 0.0017   Epoch: 17   Global Step: 215810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:10:07,375-Speed 3050.57 samples/sec   Loss 1.6093   LearningRate 0.0017   Epoch: 17   Global Step: 215820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:10:10,838-Speed 2958.11 samples/sec   Loss 1.5300   LearningRate 0.0017   Epoch: 17   Global Step: 215830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:10:14,232-Speed 3017.74 samples/sec   Loss 1.6082   LearningRate 0.0017   Epoch: 17   Global Step: 215840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:10:17,669-Speed 2979.82 samples/sec   Loss 1.5530   LearningRate 0.0017   Epoch: 17   Global Step: 215850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:10:21,066-Speed 3015.81 samples/sec   Loss 1.5977   LearningRate 0.0017   Epoch: 17   Global Step: 215860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:10:24,429-Speed 3046.16 samples/sec   Loss 1.5574   LearningRate 0.0017   Epoch: 17   Global Step: 215870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:10:27,822-Speed 3017.96 samples/sec   Loss 1.5650   LearningRate 0.0017   Epoch: 17   Global Step: 215880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:10:31,258-Speed 2981.32 samples/sec   Loss 1.6064   LearningRate 0.0017   Epoch: 17   Global Step: 215890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:10:34,643-Speed 3026.37 samples/sec   Loss 1.6150   LearningRate 0.0017   Epoch: 17   Global Step: 215900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:10:37,977-Speed 3072.30 samples/sec   Loss 1.6307   LearningRate 0.0017   Epoch: 17   Global Step: 215910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:10:41,307-Speed 3076.50 samples/sec   Loss 1.5440   LearningRate 0.0017   Epoch: 17   Global Step: 215920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:10:44,691-Speed 3026.59 samples/sec   Loss 1.5536   LearningRate 0.0017   Epoch: 17   Global Step: 215930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:10:48,089-Speed 3013.64 samples/sec   Loss 1.5992   LearningRate 0.0017   Epoch: 17   Global Step: 215940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:10:51,519-Speed 2986.33 samples/sec   Loss 1.5631   LearningRate 0.0017   Epoch: 17   Global Step: 215950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:10:54,897-Speed 3032.59 samples/sec   Loss 1.6319   LearningRate 0.0017   Epoch: 17   Global Step: 215960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:10:58,252-Speed 3053.11 samples/sec   Loss 1.6076   LearningRate 0.0017   Epoch: 17   Global Step: 215970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:11:01,662-Speed 3003.65 samples/sec   Loss 1.5860   LearningRate 0.0017   Epoch: 17   Global Step: 215980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:11:04,991-Speed 3077.03 samples/sec   Loss 1.5836   LearningRate 0.0017   Epoch: 17   Global Step: 215990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:11:08,397-Speed 3007.97 samples/sec   Loss 1.6335   LearningRate 0.0017   Epoch: 17   Global Step: 216000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:11:11,893-Speed 2929.37 samples/sec   Loss 1.6211   LearningRate 0.0017   Epoch: 17   Global Step: 216010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:11:15,339-Speed 2972.10 samples/sec   Loss 1.5669   LearningRate 0.0017   Epoch: 17   Global Step: 216020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:11:18,803-Speed 2957.20 samples/sec   Loss 1.5949   LearningRate 0.0017   Epoch: 17   Global Step: 216030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:11:22,197-Speed 3018.62 samples/sec   Loss 1.6447   LearningRate 0.0017   Epoch: 17   Global Step: 216040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:11:25,572-Speed 3034.99 samples/sec   Loss 1.6042   LearningRate 0.0017   Epoch: 17   Global Step: 216050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:11:28,904-Speed 3073.95 samples/sec   Loss 1.6018   LearningRate 0.0017   Epoch: 17   Global Step: 216060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:11:32,276-Speed 3037.91 samples/sec   Loss 1.5561   LearningRate 0.0017   Epoch: 17   Global Step: 216070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:11:35,603-Speed 3077.55 samples/sec   Loss 1.5915   LearningRate 0.0017   Epoch: 17   Global Step: 216080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:11:39,032-Speed 2987.45 samples/sec   Loss 1.5732   LearningRate 0.0017   Epoch: 17   Global Step: 216090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:11:42,404-Speed 3037.75 samples/sec   Loss 1.5279   LearningRate 0.0017   Epoch: 17   Global Step: 216100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:11:45,742-Speed 3068.37 samples/sec   Loss 1.5305   LearningRate 0.0017   Epoch: 17   Global Step: 216110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:11:49,099-Speed 3051.35 samples/sec   Loss 1.6463   LearningRate 0.0017   Epoch: 17   Global Step: 216120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:11:52,456-Speed 3051.35 samples/sec   Loss 1.6306   LearningRate 0.0017   Epoch: 17   Global Step: 216130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:11:55,793-Speed 3069.52 samples/sec   Loss 1.5845   LearningRate 0.0017   Epoch: 17   Global Step: 216140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:11:59,148-Speed 3053.37 samples/sec   Loss 1.6086   LearningRate 0.0017   Epoch: 17   Global Step: 216150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:12:02,552-Speed 3009.15 samples/sec   Loss 1.5527   LearningRate 0.0017   Epoch: 17   Global Step: 216160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:12:05,980-Speed 2987.99 samples/sec   Loss 1.6142   LearningRate 0.0017   Epoch: 17   Global Step: 216170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:12:09,354-Speed 3036.11 samples/sec   Loss 1.6266   LearningRate 0.0017   Epoch: 17   Global Step: 216180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:12:12,748-Speed 3017.86 samples/sec   Loss 1.6085   LearningRate 0.0017   Epoch: 17   Global Step: 216190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:12:16,240-Speed 2933.17 samples/sec   Loss 1.5929   LearningRate 0.0017   Epoch: 17   Global Step: 216200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:12:19,630-Speed 3021.89 samples/sec   Loss 1.5691   LearningRate 0.0017   Epoch: 17   Global Step: 216210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:12:23,033-Speed 3009.55 samples/sec   Loss 1.5562   LearningRate 0.0017   Epoch: 17   Global Step: 216220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:12:26,375-Speed 3065.04 samples/sec   Loss 1.6029   LearningRate 0.0017   Epoch: 17   Global Step: 216230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:12:29,729-Speed 3053.70 samples/sec   Loss 1.6364   LearningRate 0.0017   Epoch: 17   Global Step: 216240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:12:33,120-Speed 3020.94 samples/sec   Loss 1.6006   LearningRate 0.0017   Epoch: 17   Global Step: 216250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:12:36,484-Speed 3045.27 samples/sec   Loss 1.5755   LearningRate 0.0017   Epoch: 17   Global Step: 216260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:12:39,848-Speed 3044.79 samples/sec   Loss 1.6318   LearningRate 0.0017   Epoch: 17   Global Step: 216270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:12:43,239-Speed 3020.68 samples/sec   Loss 1.5677   LearningRate 0.0017   Epoch: 17   Global Step: 216280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:12:46,641-Speed 3010.59 samples/sec   Loss 1.5346   LearningRate 0.0017   Epoch: 17   Global Step: 216290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:12:50,008-Speed 3042.14 samples/sec   Loss 1.5993   LearningRate 0.0017   Epoch: 17   Global Step: 216300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:12:53,406-Speed 3014.79 samples/sec   Loss 1.5969   LearningRate 0.0017   Epoch: 17   Global Step: 216310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:12:56,801-Speed 3016.80 samples/sec   Loss 1.5946   LearningRate 0.0017   Epoch: 17   Global Step: 216320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:00,138-Speed 3069.82 samples/sec   Loss 1.6620   LearningRate 0.0017   Epoch: 17   Global Step: 216330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:03,533-Speed 3018.03 samples/sec   Loss 1.6509   LearningRate 0.0017   Epoch: 17   Global Step: 216340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:06,936-Speed 3008.97 samples/sec   Loss 1.5515   LearningRate 0.0017   Epoch: 17   Global Step: 216350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:10,414-Speed 2946.14 samples/sec   Loss 1.5517   LearningRate 0.0017   Epoch: 17   Global Step: 216360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:13,822-Speed 3005.30 samples/sec   Loss 1.5444   LearningRate 0.0017   Epoch: 17   Global Step: 216370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:17,240-Speed 2996.39 samples/sec   Loss 1.5965   LearningRate 0.0017   Epoch: 17   Global Step: 216380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:20,633-Speed 3020.86 samples/sec   Loss 1.5847   LearningRate 0.0017   Epoch: 17   Global Step: 216390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:24,157-Speed 2906.34 samples/sec   Loss 1.6304   LearningRate 0.0017   Epoch: 17   Global Step: 216400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:27,611-Speed 2965.47 samples/sec   Loss 1.6069   LearningRate 0.0017   Epoch: 17   Global Step: 216410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:13:31,107-Speed 2929.37 samples/sec   Loss 1.6331   LearningRate 0.0017   Epoch: 17   Global Step: 216420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:13:34,513-Speed 3008.04 samples/sec   Loss 1.5990   LearningRate 0.0017   Epoch: 17   Global Step: 216430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:37,957-Speed 2973.78 samples/sec   Loss 1.6480   LearningRate 0.0017   Epoch: 17   Global Step: 216440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:41,361-Speed 3009.43 samples/sec   Loss 1.5868   LearningRate 0.0017   Epoch: 17   Global Step: 216450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:44,816-Speed 2964.57 samples/sec   Loss 1.5553   LearningRate 0.0017   Epoch: 17   Global Step: 216460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:48,221-Speed 3007.97 samples/sec   Loss 1.5680   LearningRate 0.0017   Epoch: 17   Global Step: 216470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:51,691-Speed 2952.20 samples/sec   Loss 1.5731   LearningRate 0.0017   Epoch: 17   Global Step: 216480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:54,998-Speed 3097.60 samples/sec   Loss 1.5979   LearningRate 0.0017   Epoch: 17   Global Step: 216490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:13:58,466-Speed 2952.83 samples/sec   Loss 1.6026   LearningRate 0.0017   Epoch: 17   Global Step: 216500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:14:01,906-Speed 2977.91 samples/sec   Loss 1.5581   LearningRate 0.0016   Epoch: 17   Global Step: 216510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:14:05,263-Speed 3051.31 samples/sec   Loss 1.5972   LearningRate 0.0016   Epoch: 17   Global Step: 216520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:14:08,627-Speed 3044.47 samples/sec   Loss 1.5873   LearningRate 0.0016   Epoch: 17   Global Step: 216530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:14:11,983-Speed 3052.61 samples/sec   Loss 1.5869   LearningRate 0.0016   Epoch: 17   Global Step: 216540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:14:15,352-Speed 3039.94 samples/sec   Loss 1.6090   LearningRate 0.0016   Epoch: 17   Global Step: 216550   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:14:18,827-Speed 2947.69 samples/sec   Loss 1.5901   LearningRate 0.0016   Epoch: 17   Global Step: 216560   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:14:22,212-Speed 3026.63 samples/sec   Loss 1.6487   LearningRate 0.0016   Epoch: 17   Global Step: 216570   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:14:25,627-Speed 2998.65 samples/sec   Loss 1.5880   LearningRate 0.0016   Epoch: 17   Global Step: 216580   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:14:28,995-Speed 3041.21 samples/sec   Loss 1.5993   LearningRate 0.0016   Epoch: 17   Global Step: 216590   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:14:32,439-Speed 2975.06 samples/sec   Loss 1.6507   LearningRate 0.0016   Epoch: 17   Global Step: 216600   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:14:35,891-Speed 2966.33 samples/sec   Loss 1.6065   LearningRate 0.0016   Epoch: 17   Global Step: 216610   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:14:39,276-Speed 3026.38 samples/sec   Loss 1.5931   LearningRate 0.0016   Epoch: 17   Global Step: 216620   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:14:42,634-Speed 3051.26 samples/sec   Loss 1.6188   LearningRate 0.0016   Epoch: 17   Global Step: 216630   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:14:46,070-Speed 2980.30 samples/sec   Loss 1.5950   LearningRate 0.0016   Epoch: 17   Global Step: 216640   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:14:49,495-Speed 2991.22 samples/sec   Loss 1.6100   LearningRate 0.0016   Epoch: 17   Global Step: 216650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:14:52,838-Speed 3063.73 samples/sec   Loss 1.5983   LearningRate 0.0016   Epoch: 17   Global Step: 216660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:14:56,205-Speed 3042.24 samples/sec   Loss 1.6545   LearningRate 0.0016   Epoch: 17   Global Step: 216670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:14:59,583-Speed 3031.66 samples/sec   Loss 1.6201   LearningRate 0.0016   Epoch: 17   Global Step: 216680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:15:02,987-Speed 3009.48 samples/sec   Loss 1.6061   LearningRate 0.0016   Epoch: 17   Global Step: 216690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:15:06,389-Speed 3010.94 samples/sec   Loss 1.6331   LearningRate 0.0016   Epoch: 17   Global Step: 216700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:15:09,865-Speed 2946.78 samples/sec   Loss 1.6303   LearningRate 0.0016   Epoch: 17   Global Step: 216710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:15:13,227-Speed 3046.73 samples/sec   Loss 1.6692   LearningRate 0.0016   Epoch: 17   Global Step: 216720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:15:16,568-Speed 3064.87 samples/sec   Loss 1.6211   LearningRate 0.0016   Epoch: 17   Global Step: 216730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:15:20,033-Speed 2956.54 samples/sec   Loss 1.5530   LearningRate 0.0016   Epoch: 17   Global Step: 216740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:15:23,394-Speed 3049.15 samples/sec   Loss 1.6255   LearningRate 0.0016   Epoch: 17   Global Step: 216750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:15:26,750-Speed 3052.08 samples/sec   Loss 1.5919   LearningRate 0.0016   Epoch: 17   Global Step: 216760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:15:30,218-Speed 2953.08 samples/sec   Loss 1.5790   LearningRate 0.0016   Epoch: 17   Global Step: 216770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:15:33,608-Speed 3022.55 samples/sec   Loss 1.5801   LearningRate 0.0016   Epoch: 17   Global Step: 216780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:15:37,034-Speed 2989.60 samples/sec   Loss 1.5939   LearningRate 0.0016   Epoch: 17   Global Step: 216790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:15:40,466-Speed 2983.96 samples/sec   Loss 1.6403   LearningRate 0.0016   Epoch: 17   Global Step: 216800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:15:43,867-Speed 3012.30 samples/sec   Loss 1.6013   LearningRate 0.0016   Epoch: 17   Global Step: 216810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:15:47,225-Speed 3050.57 samples/sec   Loss 1.5618   LearningRate 0.0016   Epoch: 17   Global Step: 216820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:15:50,588-Speed 3045.09 samples/sec   Loss 1.5825   LearningRate 0.0016   Epoch: 17   Global Step: 216830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:15:54,001-Speed 3001.54 samples/sec   Loss 1.5865   LearningRate 0.0016   Epoch: 17   Global Step: 216840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:15:57,454-Speed 2966.30 samples/sec   Loss 1.5721   LearningRate 0.0016   Epoch: 17   Global Step: 216850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:16:00,885-Speed 2985.51 samples/sec   Loss 1.6448   LearningRate 0.0016   Epoch: 17   Global Step: 216860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:16:04,254-Speed 3040.20 samples/sec   Loss 1.6759   LearningRate 0.0016   Epoch: 17   Global Step: 216870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:16:07,693-Speed 2978.05 samples/sec   Loss 1.5985   LearningRate 0.0016   Epoch: 17   Global Step: 216880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:16:11,036-Speed 3064.37 samples/sec   Loss 1.5730   LearningRate 0.0016   Epoch: 17   Global Step: 216890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:16:14,557-Speed 2908.81 samples/sec   Loss 1.6693   LearningRate 0.0016   Epoch: 17   Global Step: 216900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:16:17,936-Speed 3031.24 samples/sec   Loss 1.5861   LearningRate 0.0016   Epoch: 17   Global Step: 216910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:16:21,377-Speed 2976.57 samples/sec   Loss 1.5930   LearningRate 0.0016   Epoch: 17   Global Step: 216920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:16:24,696-Speed 3085.88 samples/sec   Loss 1.5860   LearningRate 0.0016   Epoch: 17   Global Step: 216930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:16:28,110-Speed 3000.04 samples/sec   Loss 1.5744   LearningRate 0.0016   Epoch: 17   Global Step: 216940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:16:31,470-Speed 3049.20 samples/sec   Loss 1.6100   LearningRate 0.0016   Epoch: 17   Global Step: 216950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:16:34,822-Speed 3055.61 samples/sec   Loss 1.6108   LearningRate 0.0016   Epoch: 17   Global Step: 216960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:16:38,187-Speed 3043.56 samples/sec   Loss 1.6609   LearningRate 0.0016   Epoch: 17   Global Step: 216970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:16:41,605-Speed 2997.16 samples/sec   Loss 1.5952   LearningRate 0.0016   Epoch: 17   Global Step: 216980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:16:44,916-Speed 3093.80 samples/sec   Loss 1.6285   LearningRate 0.0016   Epoch: 17   Global Step: 216990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:16:48,228-Speed 3092.65 samples/sec   Loss 1.5897   LearningRate 0.0016   Epoch: 17   Global Step: 217000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:16:51,604-Speed 3034.09 samples/sec   Loss 1.6634   LearningRate 0.0016   Epoch: 17   Global Step: 217010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:16:54,939-Speed 3070.89 samples/sec   Loss 1.6383   LearningRate 0.0016   Epoch: 17   Global Step: 217020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:16:58,332-Speed 3018.66 samples/sec   Loss 1.6104   LearningRate 0.0016   Epoch: 17   Global Step: 217030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:17:01,691-Speed 3050.15 samples/sec   Loss 1.6360   LearningRate 0.0016   Epoch: 17   Global Step: 217040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:17:05,097-Speed 3007.33 samples/sec   Loss 1.6596   LearningRate 0.0016   Epoch: 17   Global Step: 217050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:17:08,574-Speed 2945.02 samples/sec   Loss 1.6310   LearningRate 0.0016   Epoch: 17   Global Step: 217060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:17:11,941-Speed 3042.52 samples/sec   Loss 1.5797   LearningRate 0.0016   Epoch: 17   Global Step: 217070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:17:15,339-Speed 3014.26 samples/sec   Loss 1.6180   LearningRate 0.0016   Epoch: 17   Global Step: 217080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:17:18,770-Speed 2985.76 samples/sec   Loss 1.6226   LearningRate 0.0016   Epoch: 17   Global Step: 217090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:17:22,190-Speed 2994.56 samples/sec   Loss 1.6027   LearningRate 0.0016   Epoch: 17   Global Step: 217100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:17:25,557-Speed 3042.05 samples/sec   Loss 1.6376   LearningRate 0.0016   Epoch: 17   Global Step: 217110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:17:28,946-Speed 3023.07 samples/sec   Loss 1.6375   LearningRate 0.0016   Epoch: 17   Global Step: 217120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:17:32,452-Speed 2921.33 samples/sec   Loss 1.5295   LearningRate 0.0016   Epoch: 17   Global Step: 217130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:17:35,902-Speed 2968.36 samples/sec   Loss 1.5914   LearningRate 0.0016   Epoch: 17   Global Step: 217140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:17:39,329-Speed 2988.68 samples/sec   Loss 1.6505   LearningRate 0.0016   Epoch: 17   Global Step: 217150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:17:42,792-Speed 2958.99 samples/sec   Loss 1.6229   LearningRate 0.0016   Epoch: 17   Global Step: 217160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:17:46,183-Speed 3020.03 samples/sec   Loss 1.6080   LearningRate 0.0016   Epoch: 17   Global Step: 217170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:17:49,555-Speed 3038.50 samples/sec   Loss 1.5979   LearningRate 0.0016   Epoch: 17   Global Step: 217180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:17:52,906-Speed 3056.08 samples/sec   Loss 1.5982   LearningRate 0.0016   Epoch: 17   Global Step: 217190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:17:56,317-Speed 3002.59 samples/sec   Loss 1.5177   LearningRate 0.0016   Epoch: 17   Global Step: 217200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:17:59,672-Speed 3053.39 samples/sec   Loss 1.6251   LearningRate 0.0016   Epoch: 17   Global Step: 217210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:03,023-Speed 3056.67 samples/sec   Loss 1.6271   LearningRate 0.0016   Epoch: 17   Global Step: 217220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:06,440-Speed 2996.64 samples/sec   Loss 1.5581   LearningRate 0.0016   Epoch: 17   Global Step: 217230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:09,817-Speed 3033.21 samples/sec   Loss 1.5935   LearningRate 0.0016   Epoch: 17   Global Step: 217240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:13,175-Speed 3051.17 samples/sec   Loss 1.6035   LearningRate 0.0016   Epoch: 17   Global Step: 217250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:16,532-Speed 3050.83 samples/sec   Loss 1.6554   LearningRate 0.0016   Epoch: 17   Global Step: 217260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:19,894-Speed 3047.26 samples/sec   Loss 1.5839   LearningRate 0.0016   Epoch: 17   Global Step: 217270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:23,253-Speed 3049.80 samples/sec   Loss 1.6262   LearningRate 0.0016   Epoch: 17   Global Step: 217280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:26,598-Speed 3061.53 samples/sec   Loss 1.5972   LearningRate 0.0016   Epoch: 17   Global Step: 217290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:29,985-Speed 3024.41 samples/sec   Loss 1.6242   LearningRate 0.0016   Epoch: 17   Global Step: 217300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:33,328-Speed 3063.48 samples/sec   Loss 1.6054   LearningRate 0.0016   Epoch: 17   Global Step: 217310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:18:36,686-Speed 3050.19 samples/sec   Loss 1.5777   LearningRate 0.0016   Epoch: 17   Global Step: 217320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:18:40,087-Speed 3012.40 samples/sec   Loss 1.6268   LearningRate 0.0016   Epoch: 17   Global Step: 217330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:43,460-Speed 3036.54 samples/sec   Loss 1.6618   LearningRate 0.0016   Epoch: 17   Global Step: 217340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:46,810-Speed 3057.23 samples/sec   Loss 1.6350   LearningRate 0.0016   Epoch: 17   Global Step: 217350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:50,230-Speed 2995.21 samples/sec   Loss 1.6131   LearningRate 0.0016   Epoch: 17   Global Step: 217360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:53,702-Speed 2949.78 samples/sec   Loss 1.5441   LearningRate 0.0016   Epoch: 17   Global Step: 217370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:18:57,014-Speed 3092.70 samples/sec   Loss 1.6588   LearningRate 0.0016   Epoch: 17   Global Step: 217380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:19:00,410-Speed 3016.14 samples/sec   Loss 1.5881   LearningRate 0.0016   Epoch: 17   Global Step: 217390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:19:03,778-Speed 3041.58 samples/sec   Loss 1.6208   LearningRate 0.0016   Epoch: 17   Global Step: 217400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:19:07,147-Speed 3040.18 samples/sec   Loss 1.5974   LearningRate 0.0016   Epoch: 17   Global Step: 217410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:19:10,673-Speed 2904.89 samples/sec   Loss 1.6028   LearningRate 0.0016   Epoch: 17   Global Step: 217420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:19:14,061-Speed 3023.17 samples/sec   Loss 1.5606   LearningRate 0.0016   Epoch: 17   Global Step: 217430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:19:17,511-Speed 2968.67 samples/sec   Loss 1.5629   LearningRate 0.0016   Epoch: 17   Global Step: 217440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:19:20,906-Speed 3017.67 samples/sec   Loss 1.6234   LearningRate 0.0016   Epoch: 17   Global Step: 217450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:19:24,296-Speed 3020.90 samples/sec   Loss 1.5982   LearningRate 0.0016   Epoch: 17   Global Step: 217460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:19:27,711-Speed 2999.71 samples/sec   Loss 1.6022   LearningRate 0.0016   Epoch: 17   Global Step: 217470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:19:31,031-Speed 3085.10 samples/sec   Loss 1.6248   LearningRate 0.0016   Epoch: 17   Global Step: 217480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:19:34,431-Speed 3012.26 samples/sec   Loss 1.5774   LearningRate 0.0016   Epoch: 17   Global Step: 217490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:19:37,831-Speed 3012.91 samples/sec   Loss 1.5580   LearningRate 0.0015   Epoch: 17   Global Step: 217500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:19:41,181-Speed 3057.90 samples/sec   Loss 1.6111   LearningRate 0.0015   Epoch: 17   Global Step: 217510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:19:44,506-Speed 3079.99 samples/sec   Loss 1.6234   LearningRate 0.0015   Epoch: 17   Global Step: 217520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:19:47,859-Speed 3054.70 samples/sec   Loss 1.6514   LearningRate 0.0015   Epoch: 17   Global Step: 217530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:19:51,245-Speed 3026.32 samples/sec   Loss 1.5702   LearningRate 0.0015   Epoch: 17   Global Step: 217540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:19:54,661-Speed 2998.45 samples/sec   Loss 1.5836   LearningRate 0.0015   Epoch: 17   Global Step: 217550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:19:58,023-Speed 3045.88 samples/sec   Loss 1.6284   LearningRate 0.0015   Epoch: 17   Global Step: 217560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:20:01,456-Speed 2983.59 samples/sec   Loss 1.5847   LearningRate 0.0015   Epoch: 17   Global Step: 217570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:20:04,837-Speed 3029.40 samples/sec   Loss 1.5553   LearningRate 0.0015   Epoch: 17   Global Step: 217580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:20:08,214-Speed 3033.72 samples/sec   Loss 1.5858   LearningRate 0.0015   Epoch: 17   Global Step: 217590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:20:11,662-Speed 2970.65 samples/sec   Loss 1.6143   LearningRate 0.0015   Epoch: 17   Global Step: 217600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:20:15,133-Speed 2950.38 samples/sec   Loss 1.5881   LearningRate 0.0015   Epoch: 17   Global Step: 217610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:20:18,524-Speed 3021.34 samples/sec   Loss 1.5953   LearningRate 0.0015   Epoch: 17   Global Step: 217620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:20:21,881-Speed 3050.86 samples/sec   Loss 1.6878   LearningRate 0.0015   Epoch: 17   Global Step: 217630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:20:25,257-Speed 3034.26 samples/sec   Loss 1.6389   LearningRate 0.0015   Epoch: 17   Global Step: 217640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:20:28,735-Speed 2945.01 samples/sec   Loss 1.6094   LearningRate 0.0015   Epoch: 17   Global Step: 217650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:20:32,151-Speed 2998.23 samples/sec   Loss 1.6232   LearningRate 0.0015   Epoch: 17   Global Step: 217660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:20:35,549-Speed 3014.67 samples/sec   Loss 1.6380   LearningRate 0.0015   Epoch: 17   Global Step: 217670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:20:38,898-Speed 3058.69 samples/sec   Loss 1.6507   LearningRate 0.0015   Epoch: 17   Global Step: 217680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:20:42,221-Speed 3081.76 samples/sec   Loss 1.5947   LearningRate 0.0015   Epoch: 17   Global Step: 217690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:20:45,575-Speed 3053.80 samples/sec   Loss 1.6301   LearningRate 0.0015   Epoch: 17   Global Step: 217700   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:20:48,903-Speed 3077.40 samples/sec   Loss 1.6146   LearningRate 0.0015   Epoch: 17   Global Step: 217710   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:20:52,258-Speed 3053.55 samples/sec   Loss 1.5861   LearningRate 0.0015   Epoch: 17   Global Step: 217720   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:20:55,632-Speed 3035.83 samples/sec   Loss 1.5938   LearningRate 0.0015   Epoch: 17   Global Step: 217730   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:20:59,058-Speed 2989.64 samples/sec   Loss 1.5834   LearningRate 0.0015   Epoch: 17   Global Step: 217740   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:21:02,460-Speed 3010.44 samples/sec   Loss 1.6685   LearningRate 0.0015   Epoch: 17   Global Step: 217750   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:21:05,981-Speed 2909.63 samples/sec   Loss 1.6375   LearningRate 0.0015   Epoch: 17   Global Step: 217760   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:21:09,380-Speed 3013.24 samples/sec   Loss 1.6703   LearningRate 0.0015   Epoch: 17   Global Step: 217770   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:21:12,765-Speed 3026.79 samples/sec   Loss 1.6166   LearningRate 0.0015   Epoch: 17   Global Step: 217780   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:21:16,109-Speed 3062.83 samples/sec   Loss 1.6303   LearningRate 0.0015   Epoch: 17   Global Step: 217790   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:21:19,488-Speed 3031.14 samples/sec   Loss 1.6124   LearningRate 0.0015   Epoch: 17   Global Step: 217800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:21:22,909-Speed 2993.94 samples/sec   Loss 1.6075   LearningRate 0.0015   Epoch: 17   Global Step: 217810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:21:26,293-Speed 3027.16 samples/sec   Loss 1.6107   LearningRate 0.0015   Epoch: 17   Global Step: 217820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:21:29,715-Speed 2992.81 samples/sec   Loss 1.6395   LearningRate 0.0015   Epoch: 17   Global Step: 217830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:21:33,150-Speed 2981.94 samples/sec   Loss 1.5769   LearningRate 0.0015   Epoch: 17   Global Step: 217840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:21:36,518-Speed 3041.16 samples/sec   Loss 1.5968   LearningRate 0.0015   Epoch: 17   Global Step: 217850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:21:39,888-Speed 3039.63 samples/sec   Loss 1.5961   LearningRate 0.0015   Epoch: 17   Global Step: 217860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:21:43,283-Speed 3017.01 samples/sec   Loss 1.5736   LearningRate 0.0015   Epoch: 17   Global Step: 217870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:21:46,713-Speed 2985.91 samples/sec   Loss 1.6153   LearningRate 0.0015   Epoch: 17   Global Step: 217880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:21:50,145-Speed 2985.04 samples/sec   Loss 1.6081   LearningRate 0.0015   Epoch: 17   Global Step: 217890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:21:53,540-Speed 3017.57 samples/sec   Loss 1.5851   LearningRate 0.0015   Epoch: 17   Global Step: 217900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:21:57,002-Speed 2958.20 samples/sec   Loss 1.6134   LearningRate 0.0015   Epoch: 17   Global Step: 217910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:22:00,509-Speed 2921.37 samples/sec   Loss 1.6155   LearningRate 0.0015   Epoch: 17   Global Step: 217920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:22:03,840-Speed 3074.28 samples/sec   Loss 1.5777   LearningRate 0.0015   Epoch: 17   Global Step: 217930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:22:07,281-Speed 2976.64 samples/sec   Loss 1.6296   LearningRate 0.0015   Epoch: 17   Global Step: 217940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:22:10,738-Speed 2963.42 samples/sec   Loss 1.6495   LearningRate 0.0015   Epoch: 17   Global Step: 217950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:22:14,106-Speed 3040.82 samples/sec   Loss 1.6447   LearningRate 0.0015   Epoch: 17   Global Step: 217960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:22:17,500-Speed 3017.58 samples/sec   Loss 1.5538   LearningRate 0.0015   Epoch: 17   Global Step: 217970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:22:20,892-Speed 3020.46 samples/sec   Loss 1.6367   LearningRate 0.0015   Epoch: 17   Global Step: 217980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:22:24,264-Speed 3036.80 samples/sec   Loss 1.6177   LearningRate 0.0015   Epoch: 17   Global Step: 217990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:22:27,590-Speed 3080.26 samples/sec   Loss 1.6167   LearningRate 0.0015   Epoch: 17   Global Step: 218000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:22:31,064-Speed 2948.58 samples/sec   Loss 1.5693   LearningRate 0.0015   Epoch: 17   Global Step: 218010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:22:34,573-Speed 2918.74 samples/sec   Loss 1.6430   LearningRate 0.0015   Epoch: 17   Global Step: 218020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:22:37,940-Speed 3042.07 samples/sec   Loss 1.6268   LearningRate 0.0015   Epoch: 17   Global Step: 218030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:22:41,368-Speed 2987.87 samples/sec   Loss 1.5880   LearningRate 0.0015   Epoch: 17   Global Step: 218040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:22:44,812-Speed 2973.72 samples/sec   Loss 1.5729   LearningRate 0.0015   Epoch: 17   Global Step: 218050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:22:48,275-Speed 2958.45 samples/sec   Loss 1.6177   LearningRate 0.0015   Epoch: 17   Global Step: 218060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:22:51,644-Speed 3040.32 samples/sec   Loss 1.5900   LearningRate 0.0015   Epoch: 17   Global Step: 218070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:22:55,063-Speed 2997.17 samples/sec   Loss 1.6586   LearningRate 0.0015   Epoch: 17   Global Step: 218080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:22:58,472-Speed 3004.93 samples/sec   Loss 1.6414   LearningRate 0.0015   Epoch: 17   Global Step: 218090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:23:01,852-Speed 3030.28 samples/sec   Loss 1.5995   LearningRate 0.0015   Epoch: 17   Global Step: 218100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:23:05,199-Speed 3059.83 samples/sec   Loss 1.6262   LearningRate 0.0015   Epoch: 17   Global Step: 218110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:23:08,560-Speed 3048.27 samples/sec   Loss 1.6338   LearningRate 0.0015   Epoch: 17   Global Step: 218120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:23:11,969-Speed 3003.81 samples/sec   Loss 1.5760   LearningRate 0.0015   Epoch: 17   Global Step: 218130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:23:15,368-Speed 3014.05 samples/sec   Loss 1.6304   LearningRate 0.0015   Epoch: 17   Global Step: 218140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:23:18,715-Speed 3060.01 samples/sec   Loss 1.6469   LearningRate 0.0015   Epoch: 17   Global Step: 218150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:23:22,087-Speed 3037.78 samples/sec   Loss 1.5711   LearningRate 0.0015   Epoch: 17   Global Step: 218160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:23:25,388-Speed 3103.53 samples/sec   Loss 1.5939   LearningRate 0.0015   Epoch: 17   Global Step: 218170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:23:28,809-Speed 2994.14 samples/sec   Loss 1.6309   LearningRate 0.0015   Epoch: 17   Global Step: 218180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:23:32,210-Speed 3011.52 samples/sec   Loss 1.6165   LearningRate 0.0015   Epoch: 17   Global Step: 218190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:23:35,605-Speed 3016.12 samples/sec   Loss 1.5751   LearningRate 0.0015   Epoch: 17   Global Step: 218200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:23:38,997-Speed 3020.60 samples/sec   Loss 1.5921   LearningRate 0.0015   Epoch: 17   Global Step: 218210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:23:42,332-Speed 3070.67 samples/sec   Loss 1.6042   LearningRate 0.0015   Epoch: 17   Global Step: 218220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:23:45,655-Speed 3082.25 samples/sec   Loss 1.5915   LearningRate 0.0015   Epoch: 17   Global Step: 218230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:23:49,058-Speed 3010.46 samples/sec   Loss 1.6585   LearningRate 0.0015   Epoch: 17   Global Step: 218240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:23:52,482-Speed 2991.16 samples/sec   Loss 1.6171   LearningRate 0.0015   Epoch: 17   Global Step: 218250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:23:55,821-Speed 3067.69 samples/sec   Loss 1.5633   LearningRate 0.0015   Epoch: 17   Global Step: 218260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:23:59,168-Speed 3060.28 samples/sec   Loss 1.5892   LearningRate 0.0015   Epoch: 17   Global Step: 218270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:02,526-Speed 3050.12 samples/sec   Loss 1.6617   LearningRate 0.0015   Epoch: 17   Global Step: 218280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:05,870-Speed 3062.89 samples/sec   Loss 1.5875   LearningRate 0.0015   Epoch: 17   Global Step: 218290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:09,332-Speed 2959.43 samples/sec   Loss 1.6498   LearningRate 0.0015   Epoch: 17   Global Step: 218300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:12,738-Speed 3007.01 samples/sec   Loss 1.6469   LearningRate 0.0015   Epoch: 17   Global Step: 218310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:16,071-Speed 3073.31 samples/sec   Loss 1.5688   LearningRate 0.0015   Epoch: 17   Global Step: 218320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:19,452-Speed 3028.88 samples/sec   Loss 1.6152   LearningRate 0.0015   Epoch: 17   Global Step: 218330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:22,808-Speed 3052.15 samples/sec   Loss 1.6509   LearningRate 0.0015   Epoch: 17   Global Step: 218340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:26,152-Speed 3063.16 samples/sec   Loss 1.6164   LearningRate 0.0015   Epoch: 17   Global Step: 218350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:29,541-Speed 3022.45 samples/sec   Loss 1.6402   LearningRate 0.0015   Epoch: 17   Global Step: 218360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:33,025-Speed 2940.08 samples/sec   Loss 1.6023   LearningRate 0.0015   Epoch: 17   Global Step: 218370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:24:36,417-Speed 3019.58 samples/sec   Loss 1.5925   LearningRate 0.0015   Epoch: 17   Global Step: 218380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:39,856-Speed 2978.23 samples/sec   Loss 1.6867   LearningRate 0.0015   Epoch: 17   Global Step: 218390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:43,258-Speed 3010.79 samples/sec   Loss 1.6610   LearningRate 0.0015   Epoch: 17   Global Step: 218400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:46,672-Speed 3000.50 samples/sec   Loss 1.6540   LearningRate 0.0015   Epoch: 17   Global Step: 218410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:50,068-Speed 3016.61 samples/sec   Loss 1.6625   LearningRate 0.0015   Epoch: 17   Global Step: 218420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:53,494-Speed 2989.38 samples/sec   Loss 1.6459   LearningRate 0.0015   Epoch: 17   Global Step: 218430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:24:56,897-Speed 3010.15 samples/sec   Loss 1.6651   LearningRate 0.0015   Epoch: 17   Global Step: 218440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:25:00,364-Speed 2954.18 samples/sec   Loss 1.5690   LearningRate 0.0015   Epoch: 17   Global Step: 218450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:25:03,771-Speed 3005.94 samples/sec   Loss 1.5884   LearningRate 0.0015   Epoch: 17   Global Step: 218460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:25:07,124-Speed 3054.69 samples/sec   Loss 1.6018   LearningRate 0.0015   Epoch: 17   Global Step: 218470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:25:10,554-Speed 2986.86 samples/sec   Loss 1.5867   LearningRate 0.0015   Epoch: 17   Global Step: 218480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:25:14,000-Speed 2971.93 samples/sec   Loss 1.6234   LearningRate 0.0015   Epoch: 17   Global Step: 218490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:25:17,406-Speed 3007.63 samples/sec   Loss 1.6065   LearningRate 0.0015   Epoch: 17   Global Step: 218500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:25:20,799-Speed 3019.19 samples/sec   Loss 1.6559   LearningRate 0.0014   Epoch: 17   Global Step: 218510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:25:24,153-Speed 3053.84 samples/sec   Loss 1.5985   LearningRate 0.0014   Epoch: 17   Global Step: 218520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:25:27,551-Speed 3014.58 samples/sec   Loss 1.4974   LearningRate 0.0014   Epoch: 17   Global Step: 218530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:25:30,965-Speed 3000.10 samples/sec   Loss 1.5965   LearningRate 0.0014   Epoch: 17   Global Step: 218540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:25:34,423-Speed 2961.34 samples/sec   Loss 1.6626   LearningRate 0.0014   Epoch: 17   Global Step: 218550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:25:37,820-Speed 3015.08 samples/sec   Loss 1.5520   LearningRate 0.0014   Epoch: 17   Global Step: 218560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:25:41,268-Speed 2970.99 samples/sec   Loss 1.6056   LearningRate 0.0014   Epoch: 17   Global Step: 218570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:25:44,640-Speed 3038.02 samples/sec   Loss 1.6119   LearningRate 0.0014   Epoch: 17   Global Step: 218580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:25:47,996-Speed 3055.73 samples/sec   Loss 1.5994   LearningRate 0.0014   Epoch: 17   Global Step: 218590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:25:51,322-Speed 3079.30 samples/sec   Loss 1.5957   LearningRate 0.0014   Epoch: 17   Global Step: 218600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:25:54,732-Speed 3004.47 samples/sec   Loss 1.6486   LearningRate 0.0014   Epoch: 17   Global Step: 218610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:25:58,109-Speed 3032.33 samples/sec   Loss 1.6389   LearningRate 0.0014   Epoch: 17   Global Step: 218620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:26:01,547-Speed 2979.60 samples/sec   Loss 1.6767   LearningRate 0.0014   Epoch: 17   Global Step: 218630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:26:04,901-Speed 3054.09 samples/sec   Loss 1.6080   LearningRate 0.0014   Epoch: 17   Global Step: 218640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:26:08,318-Speed 2997.71 samples/sec   Loss 1.6642   LearningRate 0.0014   Epoch: 17   Global Step: 218650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:26:11,751-Speed 2983.63 samples/sec   Loss 1.5590   LearningRate 0.0014   Epoch: 17   Global Step: 218660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:26:15,168-Speed 2997.45 samples/sec   Loss 1.6608   LearningRate 0.0014   Epoch: 17   Global Step: 218670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:26:18,611-Speed 2974.96 samples/sec   Loss 1.6089   LearningRate 0.0014   Epoch: 17   Global Step: 218680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:26:21,980-Speed 3040.15 samples/sec   Loss 1.6453   LearningRate 0.0014   Epoch: 17   Global Step: 218690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:26:25,372-Speed 3019.93 samples/sec   Loss 1.5379   LearningRate 0.0014   Epoch: 17   Global Step: 218700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:26:28,762-Speed 3021.64 samples/sec   Loss 1.6167   LearningRate 0.0014   Epoch: 17   Global Step: 218710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:26:32,143-Speed 3029.13 samples/sec   Loss 1.5977   LearningRate 0.0014   Epoch: 17   Global Step: 218720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:26:35,592-Speed 2969.95 samples/sec   Loss 1.6067   LearningRate 0.0014   Epoch: 17   Global Step: 218730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:26:39,033-Speed 2976.80 samples/sec   Loss 1.5465   LearningRate 0.0014   Epoch: 17   Global Step: 218740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:26:42,531-Speed 2928.51 samples/sec   Loss 1.6202   LearningRate 0.0014   Epoch: 17   Global Step: 218750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:26:45,971-Speed 2977.44 samples/sec   Loss 1.5861   LearningRate 0.0014   Epoch: 17   Global Step: 218760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:26:49,385-Speed 3000.22 samples/sec   Loss 1.6071   LearningRate 0.0014   Epoch: 17   Global Step: 218770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:26:52,717-Speed 3074.11 samples/sec   Loss 1.6372   LearningRate 0.0014   Epoch: 17   Global Step: 218780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:26:56,109-Speed 3019.32 samples/sec   Loss 1.6313   LearningRate 0.0014   Epoch: 17   Global Step: 218790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:26:59,474-Speed 3043.89 samples/sec   Loss 1.6000   LearningRate 0.0014   Epoch: 17   Global Step: 218800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:27:02,832-Speed 3050.73 samples/sec   Loss 1.6404   LearningRate 0.0014   Epoch: 17   Global Step: 218810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:27:06,172-Speed 3066.54 samples/sec   Loss 1.6778   LearningRate 0.0014   Epoch: 17   Global Step: 218820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:27:09,562-Speed 3021.43 samples/sec   Loss 1.6496   LearningRate 0.0014   Epoch: 17   Global Step: 218830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:27:12,954-Speed 3020.25 samples/sec   Loss 1.6352   LearningRate 0.0014   Epoch: 17   Global Step: 218840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:27:16,291-Speed 3069.32 samples/sec   Loss 1.6488   LearningRate 0.0014   Epoch: 17   Global Step: 218850   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:27:19,738-Speed 2971.45 samples/sec   Loss 1.6272   LearningRate 0.0014   Epoch: 17   Global Step: 218860   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:27:23,180-Speed 2976.59 samples/sec   Loss 1.5846   LearningRate 0.0014   Epoch: 17   Global Step: 218870   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:27:26,610-Speed 2985.53 samples/sec   Loss 1.5800   LearningRate 0.0014   Epoch: 17   Global Step: 218880   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:27:30,023-Speed 3001.64 samples/sec   Loss 1.6270   LearningRate 0.0014   Epoch: 17   Global Step: 218890   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:27:33,459-Speed 2980.83 samples/sec   Loss 1.6282   LearningRate 0.0014   Epoch: 17   Global Step: 218900   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:27:36,813-Speed 3053.60 samples/sec   Loss 1.6397   LearningRate 0.0014   Epoch: 17   Global Step: 218910   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:27:40,229-Speed 2998.45 samples/sec   Loss 1.6944   LearningRate 0.0014   Epoch: 17   Global Step: 218920   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:27:43,632-Speed 3010.20 samples/sec   Loss 1.6412   LearningRate 0.0014   Epoch: 17   Global Step: 218930   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:27:47,087-Speed 2964.30 samples/sec   Loss 1.6020   LearningRate 0.0014   Epoch: 17   Global Step: 218940   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:27:50,490-Speed 3010.57 samples/sec   Loss 1.5865   LearningRate 0.0014   Epoch: 17   Global Step: 218950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:27:53,972-Speed 2941.76 samples/sec   Loss 1.5906   LearningRate 0.0014   Epoch: 17   Global Step: 218960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:27:57,340-Speed 3040.65 samples/sec   Loss 1.6344   LearningRate 0.0014   Epoch: 17   Global Step: 218970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:28:00,765-Speed 2992.63 samples/sec   Loss 1.6089   LearningRate 0.0014   Epoch: 17   Global Step: 218980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:28:04,256-Speed 2933.56 samples/sec   Loss 1.5712   LearningRate 0.0014   Epoch: 17   Global Step: 218990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:28:07,678-Speed 2992.99 samples/sec   Loss 1.5786   LearningRate 0.0014   Epoch: 17   Global Step: 219000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:28:11,108-Speed 2987.06 samples/sec   Loss 1.6292   LearningRate 0.0014   Epoch: 17   Global Step: 219010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:28:14,472-Speed 3044.48 samples/sec   Loss 1.6909   LearningRate 0.0014   Epoch: 17   Global Step: 219020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:28:17,845-Speed 3036.66 samples/sec   Loss 1.6494   LearningRate 0.0014   Epoch: 17   Global Step: 219030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:28:21,287-Speed 2976.03 samples/sec   Loss 1.6149   LearningRate 0.0014   Epoch: 17   Global Step: 219040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:28:24,612-Speed 3080.42 samples/sec   Loss 1.5856   LearningRate 0.0014   Epoch: 17   Global Step: 219050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:28:28,007-Speed 3017.00 samples/sec   Loss 1.6363   LearningRate 0.0014   Epoch: 17   Global Step: 219060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:28:31,418-Speed 3002.94 samples/sec   Loss 1.6480   LearningRate 0.0014   Epoch: 17   Global Step: 219070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:28:34,803-Speed 3026.05 samples/sec   Loss 1.6759   LearningRate 0.0014   Epoch: 17   Global Step: 219080   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:28:38,184-Speed 3029.03 samples/sec   Loss 1.6080   LearningRate 0.0014   Epoch: 17   Global Step: 219090   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:28:41,617-Speed 2984.57 samples/sec   Loss 1.6265   LearningRate 0.0014   Epoch: 17   Global Step: 219100   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:28:45,005-Speed 3022.60 samples/sec   Loss 1.6297   LearningRate 0.0014   Epoch: 17   Global Step: 219110   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:28:48,347-Speed 3064.98 samples/sec   Loss 1.6088   LearningRate 0.0014   Epoch: 17   Global Step: 219120   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:28:51,807-Speed 2960.69 samples/sec   Loss 1.5775   LearningRate 0.0014   Epoch: 17   Global Step: 219130   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:28:55,182-Speed 3034.82 samples/sec   Loss 1.6756   LearningRate 0.0014   Epoch: 17   Global Step: 219140   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:28:58,581-Speed 3013.84 samples/sec   Loss 1.6418   LearningRate 0.0014   Epoch: 17   Global Step: 219150   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:29:01,970-Speed 3021.74 samples/sec   Loss 1.6824   LearningRate 0.0014   Epoch: 17   Global Step: 219160   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:29:05,416-Speed 2972.65 samples/sec   Loss 1.6186   LearningRate 0.0014   Epoch: 17   Global Step: 219170   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:29:08,913-Speed 2928.73 samples/sec   Loss 1.6457   LearningRate 0.0014   Epoch: 17   Global Step: 219180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:29:12,352-Speed 2978.42 samples/sec   Loss 1.5969   LearningRate 0.0014   Epoch: 17   Global Step: 219190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:29:15,700-Speed 3060.09 samples/sec   Loss 1.6489   LearningRate 0.0014   Epoch: 17   Global Step: 219200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:29:19,090-Speed 3021.70 samples/sec   Loss 1.6457   LearningRate 0.0014   Epoch: 17   Global Step: 219210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:29:22,439-Speed 3057.79 samples/sec   Loss 1.6520   LearningRate 0.0014   Epoch: 17   Global Step: 219220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:29:25,789-Speed 3057.53 samples/sec   Loss 1.6042   LearningRate 0.0014   Epoch: 17   Global Step: 219230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:29:29,164-Speed 3035.51 samples/sec   Loss 1.6579   LearningRate 0.0014   Epoch: 17   Global Step: 219240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:29:32,577-Speed 3001.29 samples/sec   Loss 1.6481   LearningRate 0.0014   Epoch: 17   Global Step: 219250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:29:35,923-Speed 3061.35 samples/sec   Loss 1.5417   LearningRate 0.0014   Epoch: 17   Global Step: 219260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:29:39,272-Speed 3058.74 samples/sec   Loss 1.5894   LearningRate 0.0014   Epoch: 17   Global Step: 219270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:29:42,628-Speed 3052.06 samples/sec   Loss 1.5974   LearningRate 0.0014   Epoch: 17   Global Step: 219280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:29:46,079-Speed 2967.67 samples/sec   Loss 1.5981   LearningRate 0.0014   Epoch: 17   Global Step: 219290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:29:49,432-Speed 3054.81 samples/sec   Loss 1.5664   LearningRate 0.0014   Epoch: 17   Global Step: 219300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:29:52,853-Speed 2994.89 samples/sec   Loss 1.6368   LearningRate 0.0014   Epoch: 17   Global Step: 219310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:29:56,283-Speed 2985.69 samples/sec   Loss 1.6371   LearningRate 0.0014   Epoch: 17   Global Step: 219320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:29:59,679-Speed 3016.42 samples/sec   Loss 1.6143   LearningRate 0.0014   Epoch: 17   Global Step: 219330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:30:03,027-Speed 3059.00 samples/sec   Loss 1.6108   LearningRate 0.0014   Epoch: 17   Global Step: 219340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:30:07,156-Speed 2480.80 samples/sec   Loss 1.5843   LearningRate 0.0014   Epoch: 17   Global Step: 219350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:30:10,534-Speed 3031.86 samples/sec   Loss 1.6702   LearningRate 0.0014   Epoch: 17   Global Step: 219360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:30:14,029-Speed 2931.38 samples/sec   Loss 1.6073   LearningRate 0.0014   Epoch: 17   Global Step: 219370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:30:18,862-Speed 2119.12 samples/sec   Loss 1.6314   LearningRate 0.0014   Epoch: 17   Global Step: 219380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:30:22,317-Speed 2964.46 samples/sec   Loss 1.5973   LearningRate 0.0014   Epoch: 17   Global Step: 219390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:30:26,377-Speed 2522.57 samples/sec   Loss 1.5512   LearningRate 0.0014   Epoch: 17   Global Step: 219400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:30:30,416-Speed 2536.53 samples/sec   Loss 1.6492   LearningRate 0.0014   Epoch: 17   Global Step: 219410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:30:33,738-Speed 3082.87 samples/sec   Loss 1.6460   LearningRate 0.0014   Epoch: 17   Global Step: 219420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:30:37,087-Speed 3059.13 samples/sec   Loss 1.5802   LearningRate 0.0014   Epoch: 17   Global Step: 219430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:30:40,502-Speed 2999.20 samples/sec   Loss 1.5962   LearningRate 0.0014   Epoch: 17   Global Step: 219440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:30:43,863-Speed 3047.60 samples/sec   Loss 1.6479   LearningRate 0.0014   Epoch: 17   Global Step: 219450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:30:47,365-Speed 2925.27 samples/sec   Loss 1.6400   LearningRate 0.0014   Epoch: 17   Global Step: 219460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:30:50,771-Speed 3007.16 samples/sec   Loss 1.6823   LearningRate 0.0014   Epoch: 17   Global Step: 219470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:30:54,150-Speed 3031.26 samples/sec   Loss 1.6063   LearningRate 0.0014   Epoch: 17   Global Step: 219480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:30:57,555-Speed 3008.08 samples/sec   Loss 1.5954   LearningRate 0.0014   Epoch: 17   Global Step: 219490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:31:00,895-Speed 3067.22 samples/sec   Loss 1.5869   LearningRate 0.0014   Epoch: 17   Global Step: 219500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:31:04,214-Speed 3085.62 samples/sec   Loss 1.6304   LearningRate 0.0014   Epoch: 17   Global Step: 219510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:31:07,634-Speed 2994.84 samples/sec   Loss 1.5468   LearningRate 0.0014   Epoch: 17   Global Step: 219520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:31:11,015-Speed 3029.24 samples/sec   Loss 1.6179   LearningRate 0.0014   Epoch: 17   Global Step: 219530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:31:14,371-Speed 3052.15 samples/sec   Loss 1.6269   LearningRate 0.0014   Epoch: 17   Global Step: 219540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:31:17,782-Speed 3002.91 samples/sec   Loss 1.6112   LearningRate 0.0014   Epoch: 17   Global Step: 219550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:31:21,164-Speed 3028.10 samples/sec   Loss 1.6685   LearningRate 0.0013   Epoch: 17   Global Step: 219560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:31:24,602-Speed 2979.38 samples/sec   Loss 1.5906   LearningRate 0.0013   Epoch: 17   Global Step: 219570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:31:27,987-Speed 3026.28 samples/sec   Loss 1.5939   LearningRate 0.0013   Epoch: 17   Global Step: 219580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:31:31,499-Speed 2916.39 samples/sec   Loss 1.6643   LearningRate 0.0013   Epoch: 17   Global Step: 219590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:31:34,894-Speed 3016.95 samples/sec   Loss 1.5869   LearningRate 0.0013   Epoch: 17   Global Step: 219600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:31:38,351-Speed 2963.43 samples/sec   Loss 1.5769   LearningRate 0.0013   Epoch: 17   Global Step: 219610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:31:41,721-Speed 3039.09 samples/sec   Loss 1.5994   LearningRate 0.0013   Epoch: 17   Global Step: 219620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:31:45,045-Speed 3081.36 samples/sec   Loss 1.6425   LearningRate 0.0013   Epoch: 17   Global Step: 219630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:31:48,458-Speed 3001.08 samples/sec   Loss 1.5906   LearningRate 0.0013   Epoch: 17   Global Step: 219640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:31:51,817-Speed 3049.87 samples/sec   Loss 1.6047   LearningRate 0.0013   Epoch: 17   Global Step: 219650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:31:55,116-Speed 3104.31 samples/sec   Loss 1.5865   LearningRate 0.0013   Epoch: 17   Global Step: 219660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:31:58,486-Speed 3040.08 samples/sec   Loss 1.6044   LearningRate 0.0013   Epoch: 17   Global Step: 219670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:32:01,873-Speed 3023.39 samples/sec   Loss 1.6268   LearningRate 0.0013   Epoch: 17   Global Step: 219680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:32:05,179-Speed 3098.81 samples/sec   Loss 1.6038   LearningRate 0.0013   Epoch: 17   Global Step: 219690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:32:08,565-Speed 3025.28 samples/sec   Loss 1.6215   LearningRate 0.0013   Epoch: 17   Global Step: 219700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:32:12,012-Speed 2971.67 samples/sec   Loss 1.5422   LearningRate 0.0013   Epoch: 17   Global Step: 219710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:32:15,390-Speed 3031.61 samples/sec   Loss 1.6630   LearningRate 0.0013   Epoch: 17   Global Step: 219720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:32:18,745-Speed 3053.62 samples/sec   Loss 1.5976   LearningRate 0.0013   Epoch: 17   Global Step: 219730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:32:22,093-Speed 3059.33 samples/sec   Loss 1.6500   LearningRate 0.0013   Epoch: 17   Global Step: 219740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:32:26,088-Speed 2563.79 samples/sec   Loss 1.6081   LearningRate 0.0013   Epoch: 17   Global Step: 219750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:32:29,490-Speed 3010.39 samples/sec   Loss 1.6601   LearningRate 0.0013   Epoch: 17   Global Step: 219760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:32:33,485-Speed 2564.24 samples/sec   Loss 1.6243   LearningRate 0.0013   Epoch: 17   Global Step: 219770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:32:36,909-Speed 2991.48 samples/sec   Loss 1.6316   LearningRate 0.0013   Epoch: 17   Global Step: 219780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:32:40,224-Speed 3090.38 samples/sec   Loss 1.6283   LearningRate 0.0013   Epoch: 17   Global Step: 219790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:32:43,635-Speed 3002.04 samples/sec   Loss 1.6607   LearningRate 0.0013   Epoch: 17   Global Step: 219800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:32:47,068-Speed 2983.87 samples/sec   Loss 1.5886   LearningRate 0.0013   Epoch: 17   Global Step: 219810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:32:50,452-Speed 3026.86 samples/sec   Loss 1.6565   LearningRate 0.0013   Epoch: 17   Global Step: 219820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:32:53,835-Speed 3027.44 samples/sec   Loss 1.6311   LearningRate 0.0013   Epoch: 17   Global Step: 219830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:32:57,283-Speed 2970.63 samples/sec   Loss 1.6185   LearningRate 0.0013   Epoch: 17   Global Step: 219840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:33:00,682-Speed 3013.56 samples/sec   Loss 1.6754   LearningRate 0.0013   Epoch: 17   Global Step: 219850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:33:04,029-Speed 3060.44 samples/sec   Loss 1.5956   LearningRate 0.0013   Epoch: 17   Global Step: 219860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:33:07,492-Speed 2957.53 samples/sec   Loss 1.6496   LearningRate 0.0013   Epoch: 17   Global Step: 219870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:33:10,888-Speed 3016.61 samples/sec   Loss 1.6168   LearningRate 0.0013   Epoch: 17   Global Step: 219880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:33:14,248-Speed 3048.32 samples/sec   Loss 1.6400   LearningRate 0.0013   Epoch: 17   Global Step: 219890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:33:17,656-Speed 3005.13 samples/sec   Loss 1.6508   LearningRate 0.0013   Epoch: 17   Global Step: 219900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:33:21,103-Speed 2971.16 samples/sec   Loss 1.6378   LearningRate 0.0013   Epoch: 17   Global Step: 219910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:33:24,428-Speed 3080.97 samples/sec   Loss 1.5976   LearningRate 0.0013   Epoch: 17   Global Step: 219920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:33:27,769-Speed 3065.66 samples/sec   Loss 1.6696   LearningRate 0.0013   Epoch: 17   Global Step: 219930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:33:31,129-Speed 3048.62 samples/sec   Loss 1.7067   LearningRate 0.0013   Epoch: 17   Global Step: 219940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:33:34,480-Speed 3056.64 samples/sec   Loss 1.6247   LearningRate 0.0013   Epoch: 17   Global Step: 219950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:33:37,853-Speed 3037.23 samples/sec   Loss 1.5699   LearningRate 0.0013   Epoch: 17   Global Step: 219960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:33:41,275-Speed 2992.69 samples/sec   Loss 1.5830   LearningRate 0.0013   Epoch: 17   Global Step: 219970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:33:44,694-Speed 2995.62 samples/sec   Loss 1.6079   LearningRate 0.0013   Epoch: 17   Global Step: 219980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:33:48,114-Speed 2994.91 samples/sec   Loss 1.6383   LearningRate 0.0013   Epoch: 17   Global Step: 219990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:33:51,499-Speed 3026.02 samples/sec   Loss 1.6419   LearningRate 0.0013   Epoch: 17   Global Step: 220000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:33:54,886-Speed 3024.55 samples/sec   Loss 1.5617   LearningRate 0.0013   Epoch: 17   Global Step: 220010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:33:58,219-Speed 3072.81 samples/sec   Loss 1.6395   LearningRate 0.0013   Epoch: 17   Global Step: 220020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:34:01,612-Speed 3018.77 samples/sec   Loss 1.6309   LearningRate 0.0013   Epoch: 17   Global Step: 220030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:34:04,990-Speed 3032.15 samples/sec   Loss 1.6669   LearningRate 0.0013   Epoch: 17   Global Step: 220040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:34:08,417-Speed 2989.18 samples/sec   Loss 1.5884   LearningRate 0.0013   Epoch: 17   Global Step: 220050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:34:11,814-Speed 3015.28 samples/sec   Loss 1.5602   LearningRate 0.0013   Epoch: 17   Global Step: 220060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:34:15,195-Speed 3029.51 samples/sec   Loss 1.5441   LearningRate 0.0013   Epoch: 17   Global Step: 220070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:34:18,511-Speed 3089.20 samples/sec   Loss 1.6163   LearningRate 0.0013   Epoch: 17   Global Step: 220080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:34:21,870-Speed 3049.81 samples/sec   Loss 1.6014   LearningRate 0.0013   Epoch: 17   Global Step: 220090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:34:25,273-Speed 3010.19 samples/sec   Loss 1.5851   LearningRate 0.0013   Epoch: 17   Global Step: 220100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:34:28,616-Speed 3064.32 samples/sec   Loss 1.6721   LearningRate 0.0013   Epoch: 17   Global Step: 220110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:34:31,953-Speed 3069.17 samples/sec   Loss 1.6278   LearningRate 0.0013   Epoch: 17   Global Step: 220120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:34:35,393-Speed 2977.21 samples/sec   Loss 1.5879   LearningRate 0.0013   Epoch: 17   Global Step: 220130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:34:38,744-Speed 3056.85 samples/sec   Loss 1.6230   LearningRate 0.0013   Epoch: 17   Global Step: 220140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:34:42,077-Speed 3073.76 samples/sec   Loss 1.5348   LearningRate 0.0013   Epoch: 17   Global Step: 220150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:34:45,430-Speed 3053.86 samples/sec   Loss 1.6206   LearningRate 0.0013   Epoch: 17   Global Step: 220160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:34:48,835-Speed 3008.39 samples/sec   Loss 1.6698   LearningRate 0.0013   Epoch: 17   Global Step: 220170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:34:52,141-Speed 3099.97 samples/sec   Loss 1.5767   LearningRate 0.0013   Epoch: 17   Global Step: 220180   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:34:55,523-Speed 3028.90 samples/sec   Loss 1.5993   LearningRate 0.0013   Epoch: 17   Global Step: 220190   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:34:58,862-Speed 3067.31 samples/sec   Loss 1.7041   LearningRate 0.0013   Epoch: 17   Global Step: 220200   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:35:02,352-Speed 2934.69 samples/sec   Loss 1.6519   LearningRate 0.0013   Epoch: 17   Global Step: 220210   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:35:05,770-Speed 2997.17 samples/sec   Loss 1.5858   LearningRate 0.0013   Epoch: 17   Global Step: 220220   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:35:09,171-Speed 3011.75 samples/sec   Loss 1.6149   LearningRate 0.0013   Epoch: 17   Global Step: 220230   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:35:12,559-Speed 3023.11 samples/sec   Loss 1.5907   LearningRate 0.0013   Epoch: 17   Global Step: 220240   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:35:15,939-Speed 3029.54 samples/sec   Loss 1.6435   LearningRate 0.0013   Epoch: 17   Global Step: 220250   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:35:19,322-Speed 3028.30 samples/sec   Loss 1.6497   LearningRate 0.0013   Epoch: 17   Global Step: 220260   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:35:22,701-Speed 3031.14 samples/sec   Loss 1.6330   LearningRate 0.0013   Epoch: 17   Global Step: 220270   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:35:26,066-Speed 3043.88 samples/sec   Loss 1.5785   LearningRate 0.0013   Epoch: 17   Global Step: 220280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:35:29,462-Speed 3016.14 samples/sec   Loss 1.6892   LearningRate 0.0013   Epoch: 17   Global Step: 220290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:35:32,807-Speed 3062.19 samples/sec   Loss 1.6364   LearningRate 0.0013   Epoch: 17   Global Step: 220300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:35:36,261-Speed 2964.98 samples/sec   Loss 1.6090   LearningRate 0.0013   Epoch: 17   Global Step: 220310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:35:39,617-Speed 3052.08 samples/sec   Loss 1.6237   LearningRate 0.0013   Epoch: 17   Global Step: 220320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:35:42,972-Speed 3052.85 samples/sec   Loss 1.6165   LearningRate 0.0013   Epoch: 17   Global Step: 220330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:35:46,355-Speed 3028.23 samples/sec   Loss 1.6067   LearningRate 0.0013   Epoch: 17   Global Step: 220340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:35:49,708-Speed 3054.34 samples/sec   Loss 1.6390   LearningRate 0.0013   Epoch: 17   Global Step: 220350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:35:53,135-Speed 2989.16 samples/sec   Loss 1.6609   LearningRate 0.0013   Epoch: 17   Global Step: 220360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:35:56,509-Speed 3035.46 samples/sec   Loss 1.5830   LearningRate 0.0013   Epoch: 17   Global Step: 220370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:35:59,849-Speed 3066.47 samples/sec   Loss 1.6051   LearningRate 0.0013   Epoch: 17   Global Step: 220380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:03,277-Speed 2987.98 samples/sec   Loss 1.6115   LearningRate 0.0013   Epoch: 17   Global Step: 220390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:06,607-Speed 3076.48 samples/sec   Loss 1.6671   LearningRate 0.0013   Epoch: 17   Global Step: 220400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:09,977-Speed 3039.61 samples/sec   Loss 1.6518   LearningRate 0.0013   Epoch: 17   Global Step: 220410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:13,332-Speed 3052.34 samples/sec   Loss 1.5940   LearningRate 0.0013   Epoch: 17   Global Step: 220420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:16,784-Speed 2967.04 samples/sec   Loss 1.6164   LearningRate 0.0013   Epoch: 17   Global Step: 220430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:20,175-Speed 3021.05 samples/sec   Loss 1.6174   LearningRate 0.0013   Epoch: 17   Global Step: 220440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:23,530-Speed 3052.27 samples/sec   Loss 1.6275   LearningRate 0.0013   Epoch: 17   Global Step: 220450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:26,952-Speed 2993.66 samples/sec   Loss 1.5891   LearningRate 0.0013   Epoch: 17   Global Step: 220460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:30,341-Speed 3022.67 samples/sec   Loss 1.6410   LearningRate 0.0013   Epoch: 17   Global Step: 220470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:33,801-Speed 2960.23 samples/sec   Loss 1.5639   LearningRate 0.0013   Epoch: 17   Global Step: 220480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:36:37,160-Speed 3049.47 samples/sec   Loss 1.5378   LearningRate 0.0013   Epoch: 17   Global Step: 220490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:40,544-Speed 3027.26 samples/sec   Loss 1.6142   LearningRate 0.0013   Epoch: 17   Global Step: 220500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:43,976-Speed 2985.07 samples/sec   Loss 1.5921   LearningRate 0.0013   Epoch: 17   Global Step: 220510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:47,384-Speed 3005.14 samples/sec   Loss 1.5990   LearningRate 0.0013   Epoch: 17   Global Step: 220520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:50,804-Speed 2995.58 samples/sec   Loss 1.5552   LearningRate 0.0013   Epoch: 17   Global Step: 220530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:54,222-Speed 2996.64 samples/sec   Loss 1.5972   LearningRate 0.0013   Epoch: 17   Global Step: 220540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:36:57,608-Speed 3024.76 samples/sec   Loss 1.6110   LearningRate 0.0013   Epoch: 17   Global Step: 220550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:37:00,961-Speed 3055.01 samples/sec   Loss 1.6048   LearningRate 0.0013   Epoch: 17   Global Step: 220560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:37:04,391-Speed 2985.68 samples/sec   Loss 1.6943   LearningRate 0.0013   Epoch: 17   Global Step: 220570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:37:07,726-Speed 3071.44 samples/sec   Loss 1.6082   LearningRate 0.0013   Epoch: 17   Global Step: 220580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:37:11,108-Speed 3028.71 samples/sec   Loss 1.5917   LearningRate 0.0013   Epoch: 17   Global Step: 220590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:37:14,438-Speed 3075.64 samples/sec   Loss 1.6670   LearningRate 0.0013   Epoch: 17   Global Step: 220600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:37:17,829-Speed 3021.13 samples/sec   Loss 1.6108   LearningRate 0.0013   Epoch: 17   Global Step: 220610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:37:21,200-Speed 3038.66 samples/sec   Loss 1.6059   LearningRate 0.0013   Epoch: 17   Global Step: 220620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:37:24,629-Speed 2987.19 samples/sec   Loss 1.5768   LearningRate 0.0013   Epoch: 17   Global Step: 220630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:37:28,080-Speed 2967.76 samples/sec   Loss 1.6021   LearningRate 0.0013   Epoch: 17   Global Step: 220640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:37:31,512-Speed 2984.38 samples/sec   Loss 1.6343   LearningRate 0.0012   Epoch: 17   Global Step: 220650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:37:34,992-Speed 2943.20 samples/sec   Loss 1.6009   LearningRate 0.0012   Epoch: 17   Global Step: 220660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:37:38,417-Speed 2990.97 samples/sec   Loss 1.5785   LearningRate 0.0012   Epoch: 17   Global Step: 220670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:37:41,844-Speed 2988.70 samples/sec   Loss 1.6544   LearningRate 0.0012   Epoch: 17   Global Step: 220680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:37:45,292-Speed 2971.29 samples/sec   Loss 1.6215   LearningRate 0.0012   Epoch: 17   Global Step: 220690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:37:48,631-Speed 3067.41 samples/sec   Loss 1.5969   LearningRate 0.0012   Epoch: 17   Global Step: 220700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 22:37:51,993-Speed 3046.78 samples/sec   Loss 1.6252   LearningRate 0.0012   Epoch: 17   Global Step: 220710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:37:55,464-Speed 2950.92 samples/sec   Loss 1.6443   LearningRate 0.0012   Epoch: 17   Global Step: 220720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:37:58,826-Speed 3046.11 samples/sec   Loss 1.6541   LearningRate 0.0012   Epoch: 17   Global Step: 220730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:38:02,231-Speed 3008.35 samples/sec   Loss 1.5948   LearningRate 0.0012   Epoch: 17   Global Step: 220740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:38:05,596-Speed 3044.16 samples/sec   Loss 1.6888   LearningRate 0.0012   Epoch: 17   Global Step: 220750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:38:09,021-Speed 2990.49 samples/sec   Loss 1.5493   LearningRate 0.0012   Epoch: 17   Global Step: 220760   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:38:12,401-Speed 3031.14 samples/sec   Loss 1.6138   LearningRate 0.0012   Epoch: 17   Global Step: 220770   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:38:15,773-Speed 3037.03 samples/sec   Loss 1.6244   LearningRate 0.0012   Epoch: 17   Global Step: 220780   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:38:19,094-Speed 3084.37 samples/sec   Loss 1.6971   LearningRate 0.0012   Epoch: 17   Global Step: 220790   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:38:22,465-Speed 3038.11 samples/sec   Loss 1.6429   LearningRate 0.0012   Epoch: 17   Global Step: 220800   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:38:25,836-Speed 3038.79 samples/sec   Loss 1.6129   LearningRate 0.0012   Epoch: 17   Global Step: 220810   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:38:29,264-Speed 2988.11 samples/sec   Loss 1.6110   LearningRate 0.0012   Epoch: 17   Global Step: 220820   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:38:32,636-Speed 3038.12 samples/sec   Loss 1.5834   LearningRate 0.0012   Epoch: 17   Global Step: 220830   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:38:36,073-Speed 2979.93 samples/sec   Loss 1.5871   LearningRate 0.0012   Epoch: 17   Global Step: 220840   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:38:39,467-Speed 3018.15 samples/sec   Loss 1.6200   LearningRate 0.0012   Epoch: 17   Global Step: 220850   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:38:42,782-Speed 3089.81 samples/sec   Loss 1.6617   LearningRate 0.0012   Epoch: 17   Global Step: 220860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:38:46,117-Speed 3071.14 samples/sec   Loss 1.6027   LearningRate 0.0012   Epoch: 17   Global Step: 220870   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:38:49,533-Speed 2998.31 samples/sec   Loss 1.6425   LearningRate 0.0012   Epoch: 17   Global Step: 220880   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:38:52,984-Speed 2967.41 samples/sec   Loss 1.6111   LearningRate 0.0012   Epoch: 17   Global Step: 220890   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:38:56,330-Speed 3061.56 samples/sec   Loss 1.6856   LearningRate 0.0012   Epoch: 17   Global Step: 220900   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:38:59,777-Speed 2971.51 samples/sec   Loss 1.5922   LearningRate 0.0012   Epoch: 17   Global Step: 220910   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:39:03,221-Speed 2975.31 samples/sec   Loss 1.5221   LearningRate 0.0012   Epoch: 17   Global Step: 220920   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:39:06,613-Speed 3019.42 samples/sec   Loss 1.6167   LearningRate 0.0012   Epoch: 17   Global Step: 220930   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:39:10,061-Speed 2971.08 samples/sec   Loss 1.6207   LearningRate 0.0012   Epoch: 17   Global Step: 220940   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:39:13,491-Speed 2986.30 samples/sec   Loss 1.6786   LearningRate 0.0012   Epoch: 17   Global Step: 220950   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:39:16,890-Speed 3013.73 samples/sec   Loss 1.6610   LearningRate 0.0012   Epoch: 17   Global Step: 220960   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 22:39:20,364-Speed 2947.72 samples/sec   Loss 1.5632   LearningRate 0.0012   Epoch: 17   Global Step: 220970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:39:23,755-Speed 3020.94 samples/sec   Loss 1.5974   LearningRate 0.0012   Epoch: 17   Global Step: 220980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:39:27,131-Speed 3034.22 samples/sec   Loss 1.6284   LearningRate 0.0012   Epoch: 17   Global Step: 220990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:39:30,607-Speed 2946.74 samples/sec   Loss 1.6012   LearningRate 0.0012   Epoch: 17   Global Step: 221000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:39:33,950-Speed 3064.10 samples/sec   Loss 1.6478   LearningRate 0.0012   Epoch: 17   Global Step: 221010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:39:37,280-Speed 3075.42 samples/sec   Loss 1.6527   LearningRate 0.0012   Epoch: 17   Global Step: 221020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:39:40,725-Speed 2973.75 samples/sec   Loss 1.6707   LearningRate 0.0012   Epoch: 17   Global Step: 221030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:39:44,118-Speed 3018.88 samples/sec   Loss 1.5697   LearningRate 0.0012   Epoch: 17   Global Step: 221040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:39:47,476-Speed 3049.80 samples/sec   Loss 1.6082   LearningRate 0.0012   Epoch: 17   Global Step: 221050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:39:50,822-Speed 3060.88 samples/sec   Loss 1.6017   LearningRate 0.0012   Epoch: 17   Global Step: 221060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:39:54,202-Speed 3030.61 samples/sec   Loss 1.5948   LearningRate 0.0012   Epoch: 17   Global Step: 221070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:39:57,531-Speed 3077.03 samples/sec   Loss 1.6262   LearningRate 0.0012   Epoch: 17   Global Step: 221080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:00,884-Speed 3054.59 samples/sec   Loss 1.6419   LearningRate 0.0012   Epoch: 17   Global Step: 221090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:04,294-Speed 3003.94 samples/sec   Loss 1.5818   LearningRate 0.0012   Epoch: 17   Global Step: 221100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:07,764-Speed 2952.17 samples/sec   Loss 1.5813   LearningRate 0.0012   Epoch: 17   Global Step: 221110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:11,191-Speed 2988.93 samples/sec   Loss 1.6038   LearningRate 0.0012   Epoch: 17   Global Step: 221120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:14,578-Speed 3023.98 samples/sec   Loss 1.6198   LearningRate 0.0012   Epoch: 17   Global Step: 221130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:17,901-Speed 3082.33 samples/sec   Loss 1.6312   LearningRate 0.0012   Epoch: 17   Global Step: 221140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:21,335-Speed 2983.01 samples/sec   Loss 1.5972   LearningRate 0.0012   Epoch: 17   Global Step: 221150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:24,660-Speed 3080.90 samples/sec   Loss 1.6120   LearningRate 0.0012   Epoch: 17   Global Step: 221160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:28,005-Speed 3061.47 samples/sec   Loss 1.6165   LearningRate 0.0012   Epoch: 17   Global Step: 221170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:31,437-Speed 2985.44 samples/sec   Loss 1.6381   LearningRate 0.0012   Epoch: 17   Global Step: 221180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:34,841-Speed 3008.77 samples/sec   Loss 1.5453   LearningRate 0.0012   Epoch: 17   Global Step: 221190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:38,207-Speed 3042.42 samples/sec   Loss 1.6518   LearningRate 0.0012   Epoch: 17   Global Step: 221200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:41,647-Speed 2977.79 samples/sec   Loss 1.6780   LearningRate 0.0012   Epoch: 17   Global Step: 221210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:45,052-Speed 3008.18 samples/sec   Loss 1.5851   LearningRate 0.0012   Epoch: 17   Global Step: 221220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:48,441-Speed 3023.05 samples/sec   Loss 1.6669   LearningRate 0.0012   Epoch: 17   Global Step: 221230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:51,914-Speed 2948.71 samples/sec   Loss 1.6120   LearningRate 0.0012   Epoch: 17   Global Step: 221240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:55,374-Speed 2960.50 samples/sec   Loss 1.6093   LearningRate 0.0012   Epoch: 17   Global Step: 221250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:40:58,794-Speed 2994.76 samples/sec   Loss 1.5974   LearningRate 0.0012   Epoch: 17   Global Step: 221260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:41:02,192-Speed 3015.05 samples/sec   Loss 1.5981   LearningRate 0.0012   Epoch: 17   Global Step: 221270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:41:05,542-Speed 3057.03 samples/sec   Loss 1.6316   LearningRate 0.0012   Epoch: 17   Global Step: 221280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:41:08,894-Speed 3056.40 samples/sec   Loss 1.5835   LearningRate 0.0012   Epoch: 17   Global Step: 221290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:41:12,310-Speed 2998.93 samples/sec   Loss 1.6674   LearningRate 0.0012   Epoch: 17   Global Step: 221300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:41:15,668-Speed 3049.75 samples/sec   Loss 1.6355   LearningRate 0.0012   Epoch: 17   Global Step: 221310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:41:19,046-Speed 3032.36 samples/sec   Loss 1.6041   LearningRate 0.0012   Epoch: 17   Global Step: 221320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:41:22,482-Speed 2981.39 samples/sec   Loss 1.6292   LearningRate 0.0012   Epoch: 17   Global Step: 221330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:41:25,991-Speed 2918.79 samples/sec   Loss 1.6265   LearningRate 0.0012   Epoch: 17   Global Step: 221340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:41:29,395-Speed 3009.24 samples/sec   Loss 1.5632   LearningRate 0.0012   Epoch: 17   Global Step: 221350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:41:32,850-Speed 2965.51 samples/sec   Loss 1.6277   LearningRate 0.0012   Epoch: 17   Global Step: 221360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:41:36,250-Speed 3011.73 samples/sec   Loss 1.6675   LearningRate 0.0012   Epoch: 17   Global Step: 221370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:41:39,707-Speed 2963.31 samples/sec   Loss 1.6162   LearningRate 0.0012   Epoch: 17   Global Step: 221380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:41:43,114-Speed 3006.44 samples/sec   Loss 1.6078   LearningRate 0.0012   Epoch: 17   Global Step: 221390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:41:46,561-Speed 2971.76 samples/sec   Loss 1.6846   LearningRate 0.0012   Epoch: 17   Global Step: 221400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:41:49,952-Speed 3021.04 samples/sec   Loss 1.6272   LearningRate 0.0012   Epoch: 17   Global Step: 221410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:41:53,386-Speed 2982.24 samples/sec   Loss 1.6043   LearningRate 0.0012   Epoch: 17   Global Step: 221420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:41:56,800-Speed 3000.54 samples/sec   Loss 1.6049   LearningRate 0.0012   Epoch: 17   Global Step: 221430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:00,271-Speed 2951.11 samples/sec   Loss 1.5632   LearningRate 0.0012   Epoch: 17   Global Step: 221440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:03,623-Speed 3055.84 samples/sec   Loss 1.6574   LearningRate 0.0012   Epoch: 17   Global Step: 221450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:06,970-Speed 3059.86 samples/sec   Loss 1.6777   LearningRate 0.0012   Epoch: 17   Global Step: 221460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:10,376-Speed 3008.19 samples/sec   Loss 1.6640   LearningRate 0.0012   Epoch: 17   Global Step: 221470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:13,860-Speed 2940.05 samples/sec   Loss 1.6237   LearningRate 0.0012   Epoch: 17   Global Step: 221480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:17,292-Speed 2983.96 samples/sec   Loss 1.6087   LearningRate 0.0012   Epoch: 17   Global Step: 221490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:20,662-Speed 3040.30 samples/sec   Loss 1.6035   LearningRate 0.0012   Epoch: 17   Global Step: 221500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:24,066-Speed 3008.57 samples/sec   Loss 1.5820   LearningRate 0.0012   Epoch: 17   Global Step: 221510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:27,395-Speed 3076.77 samples/sec   Loss 1.6090   LearningRate 0.0012   Epoch: 17   Global Step: 221520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:30,716-Speed 3084.66 samples/sec   Loss 1.6211   LearningRate 0.0012   Epoch: 17   Global Step: 221530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:34,027-Speed 3093.62 samples/sec   Loss 1.6111   LearningRate 0.0012   Epoch: 17   Global Step: 221540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:37,363-Speed 3070.24 samples/sec   Loss 1.6443   LearningRate 0.0012   Epoch: 17   Global Step: 221550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:40,771-Speed 3005.83 samples/sec   Loss 1.6470   LearningRate 0.0012   Epoch: 17   Global Step: 221560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:44,199-Speed 2987.51 samples/sec   Loss 1.6123   LearningRate 0.0012   Epoch: 17   Global Step: 221570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:47,618-Speed 2996.22 samples/sec   Loss 1.6149   LearningRate 0.0012   Epoch: 17   Global Step: 221580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:51,020-Speed 3011.18 samples/sec   Loss 1.5956   LearningRate 0.0012   Epoch: 17   Global Step: 221590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:54,375-Speed 3052.39 samples/sec   Loss 1.6198   LearningRate 0.0012   Epoch: 17   Global Step: 221600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:42:57,813-Speed 2979.36 samples/sec   Loss 1.6180   LearningRate 0.0012   Epoch: 17   Global Step: 221610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:43:01,211-Speed 3014.78 samples/sec   Loss 1.5575   LearningRate 0.0012   Epoch: 17   Global Step: 221620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:43:04,594-Speed 3027.53 samples/sec   Loss 1.6182   LearningRate 0.0012   Epoch: 17   Global Step: 221630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:43:07,953-Speed 3049.35 samples/sec   Loss 1.5613   LearningRate 0.0012   Epoch: 17   Global Step: 221640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:43:11,318-Speed 3044.04 samples/sec   Loss 1.5875   LearningRate 0.0012   Epoch: 17   Global Step: 221650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:43:14,721-Speed 3009.52 samples/sec   Loss 1.5808   LearningRate 0.0012   Epoch: 17   Global Step: 221660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:43:18,138-Speed 2998.30 samples/sec   Loss 1.6331   LearningRate 0.0012   Epoch: 17   Global Step: 221670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:43:21,521-Speed 3027.81 samples/sec   Loss 1.6335   LearningRate 0.0012   Epoch: 17   Global Step: 221680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:43:24,905-Speed 3026.37 samples/sec   Loss 1.6607   LearningRate 0.0012   Epoch: 17   Global Step: 221690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:43:28,335-Speed 2986.89 samples/sec   Loss 1.6539   LearningRate 0.0012   Epoch: 17   Global Step: 221700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:43:31,772-Speed 2979.79 samples/sec   Loss 1.6223   LearningRate 0.0012   Epoch: 17   Global Step: 221710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:43:35,180-Speed 3005.32 samples/sec   Loss 1.6048   LearningRate 0.0012   Epoch: 17   Global Step: 221720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:43:38,616-Speed 2981.04 samples/sec   Loss 1.6087   LearningRate 0.0012   Epoch: 17   Global Step: 221730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:43:42,095-Speed 2945.10 samples/sec   Loss 1.6459   LearningRate 0.0012   Epoch: 17   Global Step: 221740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:43:45,506-Speed 3002.42 samples/sec   Loss 1.5989   LearningRate 0.0012   Epoch: 17   Global Step: 221750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:43:48,875-Speed 3039.99 samples/sec   Loss 1.5849   LearningRate 0.0012   Epoch: 17   Global Step: 221760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:43:52,323-Speed 2971.33 samples/sec   Loss 1.6566   LearningRate 0.0012   Epoch: 17   Global Step: 221770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:43:55,701-Speed 3031.83 samples/sec   Loss 1.6401   LearningRate 0.0011   Epoch: 17   Global Step: 221780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:43:59,098-Speed 3014.99 samples/sec   Loss 1.6509   LearningRate 0.0011   Epoch: 17   Global Step: 221790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:44:02,456-Speed 3050.79 samples/sec   Loss 1.5511   LearningRate 0.0011   Epoch: 17   Global Step: 221800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:44:05,859-Speed 3010.01 samples/sec   Loss 1.6286   LearningRate 0.0011   Epoch: 17   Global Step: 221810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:44:09,353-Speed 2930.90 samples/sec   Loss 1.5851   LearningRate 0.0011   Epoch: 17   Global Step: 221820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:44:12,715-Speed 3047.36 samples/sec   Loss 1.6744   LearningRate 0.0011   Epoch: 17   Global Step: 221830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:44:16,027-Speed 3091.76 samples/sec   Loss 1.6493   LearningRate 0.0011   Epoch: 17   Global Step: 221840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:44:19,397-Speed 3039.80 samples/sec   Loss 1.6090   LearningRate 0.0011   Epoch: 17   Global Step: 221850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:44:22,870-Speed 2949.53 samples/sec   Loss 1.5674   LearningRate 0.0011   Epoch: 17   Global Step: 221860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:44:26,292-Speed 2992.61 samples/sec   Loss 1.5922   LearningRate 0.0011   Epoch: 17   Global Step: 221870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:44:29,699-Speed 3006.58 samples/sec   Loss 1.5805   LearningRate 0.0011   Epoch: 17   Global Step: 221880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:44:33,068-Speed 3040.38 samples/sec   Loss 1.6329   LearningRate 0.0011   Epoch: 17   Global Step: 221890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:44:36,429-Speed 3047.92 samples/sec   Loss 1.5936   LearningRate 0.0011   Epoch: 17   Global Step: 221900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:44:39,785-Speed 3052.20 samples/sec   Loss 1.6402   LearningRate 0.0011   Epoch: 17   Global Step: 221910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:44:43,156-Speed 3038.82 samples/sec   Loss 1.6063   LearningRate 0.0011   Epoch: 17   Global Step: 221920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:44:46,538-Speed 3027.96 samples/sec   Loss 1.5792   LearningRate 0.0011   Epoch: 17   Global Step: 221930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:44:49,891-Speed 3055.18 samples/sec   Loss 1.5809   LearningRate 0.0011   Epoch: 17   Global Step: 221940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:44:53,343-Speed 2967.13 samples/sec   Loss 1.6059   LearningRate 0.0011   Epoch: 17   Global Step: 221950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 22:44:56,743-Speed 3012.31 samples/sec   Loss 1.6705   LearningRate 0.0011   Epoch: 17   Global Step: 221960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:45:00,146-Speed 3010.52 samples/sec   Loss 1.6543   LearningRate 0.0011   Epoch: 17   Global Step: 221970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 22:45:03,498-Speed 3055.64 samples/sec   Loss 1.5948   LearningRate 0.0011   Epoch: 17   Global Step: 221980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:45:06,854-Speed 3051.81 samples/sec   Loss 1.5985   LearningRate 0.0011   Epoch: 17   Global Step: 221990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:45:10,320-Speed 2955.61 samples/sec   Loss 1.6581   LearningRate 0.0011   Epoch: 17   Global Step: 222000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:45:13,741-Speed 2994.26 samples/sec   Loss 1.6249   LearningRate 0.0011   Epoch: 17   Global Step: 222010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:45:17,103-Speed 3046.04 samples/sec   Loss 1.5985   LearningRate 0.0011   Epoch: 17   Global Step: 222020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:45:20,538-Speed 2982.59 samples/sec   Loss 1.6662   LearningRate 0.0011   Epoch: 17   Global Step: 222030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:45:23,911-Speed 3036.78 samples/sec   Loss 1.6499   LearningRate 0.0011   Epoch: 17   Global Step: 222040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:45:27,298-Speed 3023.55 samples/sec   Loss 1.6421   LearningRate 0.0011   Epoch: 17   Global Step: 222050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:45:30,661-Speed 3046.02 samples/sec   Loss 1.6179   LearningRate 0.0011   Epoch: 17   Global Step: 222060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:45:34,062-Speed 3011.52 samples/sec   Loss 1.5877   LearningRate 0.0011   Epoch: 17   Global Step: 222070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:45:37,435-Speed 3036.80 samples/sec   Loss 1.5912   LearningRate 0.0011   Epoch: 17   Global Step: 222080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:45:40,870-Speed 2981.94 samples/sec   Loss 1.6330   LearningRate 0.0011   Epoch: 17   Global Step: 222090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:45:44,227-Speed 3051.67 samples/sec   Loss 1.5667   LearningRate 0.0011   Epoch: 17   Global Step: 222100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:45:47,682-Speed 2964.18 samples/sec   Loss 1.6179   LearningRate 0.0011   Epoch: 17   Global Step: 222110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:45:51,136-Speed 2965.27 samples/sec   Loss 1.5955   LearningRate 0.0011   Epoch: 17   Global Step: 222120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:45:54,526-Speed 3021.97 samples/sec   Loss 1.6244   LearningRate 0.0011   Epoch: 17   Global Step: 222130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:45:57,932-Speed 3006.88 samples/sec   Loss 1.6841   LearningRate 0.0011   Epoch: 17   Global Step: 222140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:46:01,266-Speed 3072.75 samples/sec   Loss 1.6135   LearningRate 0.0011   Epoch: 17   Global Step: 222150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:46:04,623-Speed 3050.58 samples/sec   Loss 1.6417   LearningRate 0.0011   Epoch: 17   Global Step: 222160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:46:07,972-Speed 3058.57 samples/sec   Loss 1.6281   LearningRate 0.0011   Epoch: 17   Global Step: 222170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:46:11,382-Speed 3004.31 samples/sec   Loss 1.6046   LearningRate 0.0011   Epoch: 17   Global Step: 222180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:46:14,775-Speed 3018.56 samples/sec   Loss 1.5983   LearningRate 0.0011   Epoch: 17   Global Step: 222190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:46:18,140-Speed 3045.13 samples/sec   Loss 1.6047   LearningRate 0.0011   Epoch: 17   Global Step: 222200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:46:21,609-Speed 2952.15 samples/sec   Loss 1.6017   LearningRate 0.0011   Epoch: 17   Global Step: 222210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:46:25,011-Speed 3011.58 samples/sec   Loss 1.5868   LearningRate 0.0011   Epoch: 17   Global Step: 222220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:46:28,456-Speed 2973.30 samples/sec   Loss 1.5972   LearningRate 0.0011   Epoch: 17   Global Step: 222230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:46:31,792-Speed 3070.01 samples/sec   Loss 1.6392   LearningRate 0.0011   Epoch: 17   Global Step: 222240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:46:35,153-Speed 3048.03 samples/sec   Loss 1.5612   LearningRate 0.0011   Epoch: 17   Global Step: 222250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:46:38,529-Speed 3033.65 samples/sec   Loss 1.6138   LearningRate 0.0011   Epoch: 17   Global Step: 222260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:46:41,902-Speed 3036.75 samples/sec   Loss 1.6384   LearningRate 0.0011   Epoch: 17   Global Step: 222270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:46:45,260-Speed 3050.37 samples/sec   Loss 1.5767   LearningRate 0.0011   Epoch: 17   Global Step: 222280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:46:48,605-Speed 3061.87 samples/sec   Loss 1.5914   LearningRate 0.0011   Epoch: 17   Global Step: 222290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:46:52,067-Speed 2958.93 samples/sec   Loss 1.6248   LearningRate 0.0011   Epoch: 17   Global Step: 222300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 22:46:55,459-Speed 3019.08 samples/sec   Loss 1.6132   LearningRate 0.0011   Epoch: 17   Global Step: 222310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:46:58,865-Speed 3007.76 samples/sec   Loss 1.6469   LearningRate 0.0011   Epoch: 17   Global Step: 222320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:02,275-Speed 3003.40 samples/sec   Loss 1.6340   LearningRate 0.0011   Epoch: 17   Global Step: 222330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:05,629-Speed 3054.82 samples/sec   Loss 1.6237   LearningRate 0.0011   Epoch: 17   Global Step: 222340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:08,982-Speed 3054.78 samples/sec   Loss 1.5743   LearningRate 0.0011   Epoch: 17   Global Step: 222350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:12,384-Speed 3010.64 samples/sec   Loss 1.6240   LearningRate 0.0011   Epoch: 17   Global Step: 222360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:15,798-Speed 3000.28 samples/sec   Loss 1.6054   LearningRate 0.0011   Epoch: 17   Global Step: 222370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:19,213-Speed 3000.05 samples/sec   Loss 1.6274   LearningRate 0.0011   Epoch: 17   Global Step: 222380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:22,604-Speed 3021.23 samples/sec   Loss 1.6081   LearningRate 0.0011   Epoch: 17   Global Step: 222390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:25,979-Speed 3035.00 samples/sec   Loss 1.6123   LearningRate 0.0011   Epoch: 17   Global Step: 222400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:29,368-Speed 3022.21 samples/sec   Loss 1.6007   LearningRate 0.0011   Epoch: 17   Global Step: 222410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 22:47:32,748-Speed 3031.16 samples/sec   Loss 1.6056   LearningRate 0.0011   Epoch: 17   Global Step: 222420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:36,127-Speed 3031.06 samples/sec   Loss 1.6264   LearningRate 0.0011   Epoch: 17   Global Step: 222430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:39,528-Speed 3011.74 samples/sec   Loss 1.5778   LearningRate 0.0011   Epoch: 17   Global Step: 222440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:42,958-Speed 2986.10 samples/sec   Loss 1.6110   LearningRate 0.0011   Epoch: 17   Global Step: 222450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:46,342-Speed 3027.08 samples/sec   Loss 1.6080   LearningRate 0.0011   Epoch: 17   Global Step: 222460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:49,686-Speed 3063.31 samples/sec   Loss 1.6151   LearningRate 0.0011   Epoch: 17   Global Step: 222470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:53,118-Speed 2985.14 samples/sec   Loss 1.5857   LearningRate 0.0011   Epoch: 17   Global Step: 222480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:56,467-Speed 3058.37 samples/sec   Loss 1.6368   LearningRate 0.0011   Epoch: 17   Global Step: 222490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:47:59,854-Speed 3024.26 samples/sec   Loss 1.5945   LearningRate 0.0011   Epoch: 17   Global Step: 222500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:48:03,269-Speed 2999.01 samples/sec   Loss 1.6240   LearningRate 0.0011   Epoch: 17   Global Step: 222510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:48:06,644-Speed 3035.58 samples/sec   Loss 1.6818   LearningRate 0.0011   Epoch: 17   Global Step: 222520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 22:48:10,030-Speed 3024.90 samples/sec   Loss 1.5985   LearningRate 0.0011   Epoch: 17   Global Step: 222530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:48:13,462-Speed 2984.33 samples/sec   Loss 1.6165   LearningRate 0.0011   Epoch: 17   Global Step: 222540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:48:16,825-Speed 3047.84 samples/sec   Loss 1.5852   LearningRate 0.0011   Epoch: 17   Global Step: 222550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:48:20,200-Speed 3034.91 samples/sec   Loss 1.5897   LearningRate 0.0011   Epoch: 17   Global Step: 222560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:48:23,576-Speed 3034.17 samples/sec   Loss 1.6127   LearningRate 0.0011   Epoch: 17   Global Step: 222570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:48:26,938-Speed 3046.00 samples/sec   Loss 1.6649   LearningRate 0.0011   Epoch: 17   Global Step: 222580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:48:30,321-Speed 3027.94 samples/sec   Loss 1.6450   LearningRate 0.0011   Epoch: 17   Global Step: 222590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:48:33,723-Speed 3010.66 samples/sec   Loss 1.5816   LearningRate 0.0011   Epoch: 17   Global Step: 222600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:48:37,147-Speed 2991.90 samples/sec   Loss 1.6229   LearningRate 0.0011   Epoch: 17   Global Step: 222610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:48:40,641-Speed 2931.57 samples/sec   Loss 1.5955   LearningRate 0.0011   Epoch: 17   Global Step: 222620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:48:44,080-Speed 2978.44 samples/sec   Loss 1.5260   LearningRate 0.0011   Epoch: 17   Global Step: 222630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:48:47,473-Speed 3019.07 samples/sec   Loss 1.5713   LearningRate 0.0011   Epoch: 17   Global Step: 222640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:48:50,796-Speed 3082.60 samples/sec   Loss 1.6499   LearningRate 0.0011   Epoch: 17   Global Step: 222650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:48:54,150-Speed 3053.70 samples/sec   Loss 1.6210   LearningRate 0.0011   Epoch: 17   Global Step: 222660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:48:57,514-Speed 3044.23 samples/sec   Loss 1.6168   LearningRate 0.0011   Epoch: 17   Global Step: 222670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:49:00,858-Speed 3063.48 samples/sec   Loss 1.5380   LearningRate 0.0011   Epoch: 17   Global Step: 222680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:49:04,251-Speed 3018.84 samples/sec   Loss 1.6291   LearningRate 0.0011   Epoch: 17   Global Step: 222690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:49:07,669-Speed 2996.72 samples/sec   Loss 1.5552   LearningRate 0.0011   Epoch: 17   Global Step: 222700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:49:11,077-Speed 3005.73 samples/sec   Loss 1.6088   LearningRate 0.0011   Epoch: 17   Global Step: 222710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:49:14,419-Speed 3064.28 samples/sec   Loss 1.6230   LearningRate 0.0011   Epoch: 17   Global Step: 222720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:49:17,768-Speed 3058.57 samples/sec   Loss 1.6674   LearningRate 0.0011   Epoch: 17   Global Step: 222730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:49:21,142-Speed 3036.61 samples/sec   Loss 1.5664   LearningRate 0.0011   Epoch: 17   Global Step: 222740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:49:24,496-Speed 3053.95 samples/sec   Loss 1.6118   LearningRate 0.0011   Epoch: 17   Global Step: 222750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:49:27,933-Speed 2980.22 samples/sec   Loss 1.6509   LearningRate 0.0011   Epoch: 17   Global Step: 222760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:49:31,331-Speed 3013.71 samples/sec   Loss 1.6628   LearningRate 0.0011   Epoch: 17   Global Step: 222770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:49:35,319-Speed 2568.23 samples/sec   Loss 1.6359   LearningRate 0.0011   Epoch: 17   Global Step: 222780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:49:38,681-Speed 3046.79 samples/sec   Loss 1.6140   LearningRate 0.0011   Epoch: 17   Global Step: 222790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:49:42,074-Speed 3019.40 samples/sec   Loss 1.5895   LearningRate 0.0011   Epoch: 17   Global Step: 222800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:49:45,453-Speed 3030.98 samples/sec   Loss 1.6364   LearningRate 0.0011   Epoch: 17   Global Step: 222810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:49:48,798-Speed 3062.62 samples/sec   Loss 1.5735   LearningRate 0.0011   Epoch: 17   Global Step: 222820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:49:52,169-Speed 3038.70 samples/sec   Loss 1.6103   LearningRate 0.0011   Epoch: 17   Global Step: 222830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:49:55,507-Speed 3068.89 samples/sec   Loss 1.5956   LearningRate 0.0011   Epoch: 17   Global Step: 222840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:49:58,879-Speed 3037.21 samples/sec   Loss 1.5427   LearningRate 0.0011   Epoch: 17   Global Step: 222850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:50:02,201-Speed 3083.42 samples/sec   Loss 1.6540   LearningRate 0.0011   Epoch: 17   Global Step: 222860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:50:05,552-Speed 3057.09 samples/sec   Loss 1.6061   LearningRate 0.0011   Epoch: 17   Global Step: 222870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:50:08,943-Speed 3020.99 samples/sec   Loss 1.5987   LearningRate 0.0011   Epoch: 17   Global Step: 222880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:50:12,359-Speed 2998.77 samples/sec   Loss 1.5974   LearningRate 0.0011   Epoch: 17   Global Step: 222890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:50:15,722-Speed 3045.56 samples/sec   Loss 1.6251   LearningRate 0.0011   Epoch: 17   Global Step: 222900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:50:19,174-Speed 2967.53 samples/sec   Loss 1.5622   LearningRate 0.0011   Epoch: 17   Global Step: 222910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:50:22,600-Speed 2989.64 samples/sec   Loss 1.6505   LearningRate 0.0011   Epoch: 17   Global Step: 222920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:50:25,998-Speed 3014.24 samples/sec   Loss 1.6152   LearningRate 0.0011   Epoch: 17   Global Step: 222930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:50:29,369-Speed 3038.72 samples/sec   Loss 1.6102   LearningRate 0.0011   Epoch: 17   Global Step: 222940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:50:32,814-Speed 2973.48 samples/sec   Loss 1.6115   LearningRate 0.0011   Epoch: 17   Global Step: 222950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:50:36,258-Speed 2973.90 samples/sec   Loss 1.6062   LearningRate 0.0011   Epoch: 17   Global Step: 222960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:50:39,698-Speed 2977.64 samples/sec   Loss 1.5570   LearningRate 0.0010   Epoch: 17   Global Step: 222970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:50:43,063-Speed 3043.62 samples/sec   Loss 1.5709   LearningRate 0.0010   Epoch: 17   Global Step: 222980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:50:46,383-Speed 3085.01 samples/sec   Loss 1.5802   LearningRate 0.0010   Epoch: 17   Global Step: 222990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:50:49,776-Speed 3019.03 samples/sec   Loss 1.6140   LearningRate 0.0010   Epoch: 17   Global Step: 223000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:50:53,138-Speed 3046.20 samples/sec   Loss 1.5713   LearningRate 0.0010   Epoch: 17   Global Step: 223010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:50:56,605-Speed 2954.64 samples/sec   Loss 1.5581   LearningRate 0.0010   Epoch: 17   Global Step: 223020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:00,099-Speed 2931.35 samples/sec   Loss 1.6203   LearningRate 0.0010   Epoch: 17   Global Step: 223030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:03,492-Speed 3019.27 samples/sec   Loss 1.6180   LearningRate 0.0010   Epoch: 17   Global Step: 223040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:06,865-Speed 3037.13 samples/sec   Loss 1.5776   LearningRate 0.0010   Epoch: 17   Global Step: 223050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:10,226-Speed 3047.09 samples/sec   Loss 1.6406   LearningRate 0.0010   Epoch: 17   Global Step: 223060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:13,616-Speed 3021.64 samples/sec   Loss 1.5823   LearningRate 0.0010   Epoch: 17   Global Step: 223070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:16,968-Speed 3056.14 samples/sec   Loss 1.5603   LearningRate 0.0010   Epoch: 17   Global Step: 223080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:20,312-Speed 3062.57 samples/sec   Loss 1.6672   LearningRate 0.0010   Epoch: 17   Global Step: 223090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 22:51:23,667-Speed 3053.20 samples/sec   Loss 1.5904   LearningRate 0.0010   Epoch: 17   Global Step: 223100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:27,127-Speed 2960.07 samples/sec   Loss 1.5651   LearningRate 0.0010   Epoch: 17   Global Step: 223110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:30,538-Speed 3003.28 samples/sec   Loss 1.5322   LearningRate 0.0010   Epoch: 17   Global Step: 223120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:33,900-Speed 3046.89 samples/sec   Loss 1.5961   LearningRate 0.0010   Epoch: 17   Global Step: 223130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:37,339-Speed 2978.03 samples/sec   Loss 1.6181   LearningRate 0.0010   Epoch: 17   Global Step: 223140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:40,653-Speed 3090.77 samples/sec   Loss 1.6016   LearningRate 0.0010   Epoch: 17   Global Step: 223150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:44,075-Speed 2993.26 samples/sec   Loss 1.6072   LearningRate 0.0010   Epoch: 17   Global Step: 223160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:47,418-Speed 3064.32 samples/sec   Loss 1.5894   LearningRate 0.0010   Epoch: 17   Global Step: 223170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:50,840-Speed 2993.12 samples/sec   Loss 1.5923   LearningRate 0.0010   Epoch: 17   Global Step: 223180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:54,201-Speed 3048.70 samples/sec   Loss 1.6384   LearningRate 0.0010   Epoch: 17   Global Step: 223190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:51:57,572-Speed 3037.84 samples/sec   Loss 1.6155   LearningRate 0.0010   Epoch: 17   Global Step: 223200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:52:01,021-Speed 2970.67 samples/sec   Loss 1.6889   LearningRate 0.0010   Epoch: 17   Global Step: 223210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:52:04,416-Speed 3016.53 samples/sec   Loss 1.5852   LearningRate 0.0010   Epoch: 17   Global Step: 223220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:52:07,754-Speed 3068.99 samples/sec   Loss 1.6729   LearningRate 0.0010   Epoch: 17   Global Step: 223230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:52:11,101-Speed 3060.06 samples/sec   Loss 1.5842   LearningRate 0.0010   Epoch: 17   Global Step: 223240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:52:14,473-Speed 3037.74 samples/sec   Loss 1.6069   LearningRate 0.0010   Epoch: 17   Global Step: 223250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:52:17,850-Speed 3033.36 samples/sec   Loss 1.5350   LearningRate 0.0010   Epoch: 17   Global Step: 223260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:52:21,301-Speed 2968.40 samples/sec   Loss 1.6060   LearningRate 0.0010   Epoch: 17   Global Step: 223270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:52:24,756-Speed 2964.95 samples/sec   Loss 1.6083   LearningRate 0.0010   Epoch: 17   Global Step: 223280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:52:28,171-Speed 2999.02 samples/sec   Loss 1.6179   LearningRate 0.0010   Epoch: 17   Global Step: 223290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:52:31,538-Speed 3042.87 samples/sec   Loss 1.5595   LearningRate 0.0010   Epoch: 17   Global Step: 223300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:52:34,911-Speed 3036.12 samples/sec   Loss 1.6989   LearningRate 0.0010   Epoch: 17   Global Step: 223310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:52:38,353-Speed 2976.42 samples/sec   Loss 1.5609   LearningRate 0.0010   Epoch: 17   Global Step: 223320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:52:41,717-Speed 3044.61 samples/sec   Loss 1.5723   LearningRate 0.0010   Epoch: 17   Global Step: 223330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:52:45,082-Speed 3043.92 samples/sec   Loss 1.6360   LearningRate 0.0010   Epoch: 17   Global Step: 223340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:52:48,478-Speed 3016.86 samples/sec   Loss 1.5677   LearningRate 0.0010   Epoch: 17   Global Step: 223350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:52:51,893-Speed 2998.90 samples/sec   Loss 1.6001   LearningRate 0.0010   Epoch: 17   Global Step: 223360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:52:55,286-Speed 3019.05 samples/sec   Loss 1.5883   LearningRate 0.0010   Epoch: 17   Global Step: 223370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:52:58,661-Speed 3034.71 samples/sec   Loss 1.6071   LearningRate 0.0010   Epoch: 17   Global Step: 223380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:02,100-Speed 2978.45 samples/sec   Loss 1.6109   LearningRate 0.0010   Epoch: 17   Global Step: 223390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:05,443-Speed 3064.07 samples/sec   Loss 1.5651   LearningRate 0.0010   Epoch: 17   Global Step: 223400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:08,830-Speed 3024.12 samples/sec   Loss 1.5847   LearningRate 0.0010   Epoch: 17   Global Step: 223410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:12,279-Speed 2969.55 samples/sec   Loss 1.5964   LearningRate 0.0010   Epoch: 17   Global Step: 223420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:15,704-Speed 2991.51 samples/sec   Loss 1.6255   LearningRate 0.0010   Epoch: 17   Global Step: 223430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:19,038-Speed 3072.00 samples/sec   Loss 1.6855   LearningRate 0.0010   Epoch: 17   Global Step: 223440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:53:22,421-Speed 3028.19 samples/sec   Loss 1.6379   LearningRate 0.0010   Epoch: 17   Global Step: 223450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:53:25,794-Speed 3036.49 samples/sec   Loss 1.5715   LearningRate 0.0010   Epoch: 17   Global Step: 223460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:29,142-Speed 3059.74 samples/sec   Loss 1.5846   LearningRate 0.0010   Epoch: 17   Global Step: 223470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:32,500-Speed 3049.87 samples/sec   Loss 1.5768   LearningRate 0.0010   Epoch: 17   Global Step: 223480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:35,859-Speed 3050.31 samples/sec   Loss 1.5279   LearningRate 0.0010   Epoch: 17   Global Step: 223490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:39,231-Speed 3036.89 samples/sec   Loss 1.5853   LearningRate 0.0010   Epoch: 17   Global Step: 223500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:42,615-Speed 3026.56 samples/sec   Loss 1.5842   LearningRate 0.0010   Epoch: 17   Global Step: 223510   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:46,048-Speed 2983.82 samples/sec   Loss 1.5701   LearningRate 0.0010   Epoch: 17   Global Step: 223520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:49,439-Speed 3021.03 samples/sec   Loss 1.5941   LearningRate 0.0010   Epoch: 17   Global Step: 223530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:52,848-Speed 3004.96 samples/sec   Loss 1.6202   LearningRate 0.0010   Epoch: 17   Global Step: 223540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:56,254-Speed 3007.20 samples/sec   Loss 1.6191   LearningRate 0.0010   Epoch: 17   Global Step: 223550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:53:59,647-Speed 3018.43 samples/sec   Loss 1.5592   LearningRate 0.0010   Epoch: 17   Global Step: 223560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:54:03,261-Speed 2834.05 samples/sec   Loss 1.6009   LearningRate 0.0010   Epoch: 17   Global Step: 223570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:54:36,514-Speed 307.96 samples/sec   Loss 1.5028   LearningRate 0.0010   Epoch: 18   Global Step: 223580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:54:39,869-Speed 3052.87 samples/sec   Loss 1.0186   LearningRate 0.0010   Epoch: 18   Global Step: 223590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:54:43,473-Speed 2842.78 samples/sec   Loss 1.0744   LearningRate 0.0010   Epoch: 18   Global Step: 223600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:54:46,860-Speed 3024.28 samples/sec   Loss 1.0923   LearningRate 0.0010   Epoch: 18   Global Step: 223610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:54:50,255-Speed 3016.19 samples/sec   Loss 1.0870   LearningRate 0.0010   Epoch: 18   Global Step: 223620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:54:53,615-Speed 3049.47 samples/sec   Loss 1.0584   LearningRate 0.0010   Epoch: 18   Global Step: 223630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:54:57,052-Speed 2980.10 samples/sec   Loss 1.0362   LearningRate 0.0010   Epoch: 18   Global Step: 223640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:55:00,383-Speed 3076.43 samples/sec   Loss 1.0382   LearningRate 0.0010   Epoch: 18   Global Step: 223650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:55:03,781-Speed 3014.42 samples/sec   Loss 1.0248   LearningRate 0.0010   Epoch: 18   Global Step: 223660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 22:55:07,171-Speed 3021.04 samples/sec   Loss 1.0418   LearningRate 0.0010   Epoch: 18   Global Step: 223670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:55:10,528-Speed 3051.60 samples/sec   Loss 1.0276   LearningRate 0.0010   Epoch: 18   Global Step: 223680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:55:13,938-Speed 3004.20 samples/sec   Loss 1.0236   LearningRate 0.0010   Epoch: 18   Global Step: 223690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:55:17,276-Speed 3067.94 samples/sec   Loss 1.0607   LearningRate 0.0010   Epoch: 18   Global Step: 223700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:55:20,679-Speed 3010.28 samples/sec   Loss 1.0488   LearningRate 0.0010   Epoch: 18   Global Step: 223710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:55:24,113-Speed 2984.02 samples/sec   Loss 1.0765   LearningRate 0.0010   Epoch: 18   Global Step: 223720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:55:27,451-Speed 3068.23 samples/sec   Loss 1.0895   LearningRate 0.0010   Epoch: 18   Global Step: 223730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:55:30,862-Speed 3003.49 samples/sec   Loss 0.9874   LearningRate 0.0010   Epoch: 18   Global Step: 223740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:55:34,194-Speed 3073.54 samples/sec   Loss 1.0460   LearningRate 0.0010   Epoch: 18   Global Step: 223750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:55:37,570-Speed 3034.30 samples/sec   Loss 1.0821   LearningRate 0.0010   Epoch: 18   Global Step: 223760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:55:40,973-Speed 3009.77 samples/sec   Loss 1.0897   LearningRate 0.0010   Epoch: 18   Global Step: 223770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:55:44,343-Speed 3038.95 samples/sec   Loss 1.0823   LearningRate 0.0010   Epoch: 18   Global Step: 223780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:55:47,690-Speed 3060.58 samples/sec   Loss 1.0314   LearningRate 0.0010   Epoch: 18   Global Step: 223790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:55:51,174-Speed 2940.65 samples/sec   Loss 1.0480   LearningRate 0.0010   Epoch: 18   Global Step: 223800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:55:54,586-Speed 3002.19 samples/sec   Loss 1.0398   LearningRate 0.0010   Epoch: 18   Global Step: 223810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:55:58,036-Speed 2968.25 samples/sec   Loss 1.0921   LearningRate 0.0010   Epoch: 18   Global Step: 223820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:56:01,431-Speed 3017.38 samples/sec   Loss 1.0117   LearningRate 0.0010   Epoch: 18   Global Step: 223830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:04,837-Speed 3007.25 samples/sec   Loss 1.0539   LearningRate 0.0010   Epoch: 18   Global Step: 223840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:08,397-Speed 2878.16 samples/sec   Loss 1.0420   LearningRate 0.0010   Epoch: 18   Global Step: 223850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:11,724-Speed 3078.85 samples/sec   Loss 1.0404   LearningRate 0.0010   Epoch: 18   Global Step: 223860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:15,246-Speed 2907.72 samples/sec   Loss 1.0633   LearningRate 0.0010   Epoch: 18   Global Step: 223870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:18,687-Speed 2977.53 samples/sec   Loss 1.0453   LearningRate 0.0010   Epoch: 18   Global Step: 223880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:22,135-Speed 2970.29 samples/sec   Loss 1.0374   LearningRate 0.0010   Epoch: 18   Global Step: 223890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:25,591-Speed 2964.27 samples/sec   Loss 1.0502   LearningRate 0.0010   Epoch: 18   Global Step: 223900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:28,999-Speed 3004.96 samples/sec   Loss 1.0487   LearningRate 0.0010   Epoch: 18   Global Step: 223910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:32,471-Speed 2949.95 samples/sec   Loss 1.0595   LearningRate 0.0010   Epoch: 18   Global Step: 223920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:35,949-Speed 2944.97 samples/sec   Loss 1.0410   LearningRate 0.0010   Epoch: 18   Global Step: 223930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 22:56:39,365-Speed 2999.45 samples/sec   Loss 1.0324   LearningRate 0.0010   Epoch: 18   Global Step: 223940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:42,754-Speed 3021.83 samples/sec   Loss 1.0420   LearningRate 0.0010   Epoch: 18   Global Step: 223950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:46,125-Speed 3039.02 samples/sec   Loss 1.0318   LearningRate 0.0010   Epoch: 18   Global Step: 223960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:49,540-Speed 2999.65 samples/sec   Loss 1.0662   LearningRate 0.0010   Epoch: 18   Global Step: 223970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:52,895-Speed 3052.95 samples/sec   Loss 1.0209   LearningRate 0.0010   Epoch: 18   Global Step: 223980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:56,270-Speed 3034.47 samples/sec   Loss 1.0428   LearningRate 0.0010   Epoch: 18   Global Step: 223990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:56:59,719-Speed 2970.03 samples/sec   Loss 0.9975   LearningRate 0.0010   Epoch: 18   Global Step: 224000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:57:03,116-Speed 3015.82 samples/sec   Loss 1.0383   LearningRate 0.0010   Epoch: 18   Global Step: 224010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:57:06,601-Speed 2939.31 samples/sec   Loss 1.0710   LearningRate 0.0010   Epoch: 18   Global Step: 224020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:57:10,037-Speed 2980.93 samples/sec   Loss 0.9964   LearningRate 0.0010   Epoch: 18   Global Step: 224030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:57:13,414-Speed 3032.87 samples/sec   Loss 1.0580   LearningRate 0.0010   Epoch: 18   Global Step: 224040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 22:57:16,909-Speed 2931.18 samples/sec   Loss 1.0164   LearningRate 0.0010   Epoch: 18   Global Step: 224050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 22:57:20,282-Speed 3036.86 samples/sec   Loss 1.0568   LearningRate 0.0010   Epoch: 18   Global Step: 224060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 22:57:23,795-Speed 2915.35 samples/sec   Loss 1.0536   LearningRate 0.0010   Epoch: 18   Global Step: 224070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 22:57:27,232-Speed 2980.07 samples/sec   Loss 1.0230   LearningRate 0.0010   Epoch: 18   Global Step: 224080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 22:57:30,652-Speed 2995.25 samples/sec   Loss 1.0385   LearningRate 0.0010   Epoch: 18   Global Step: 224090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:57:34,099-Speed 2971.89 samples/sec   Loss 1.0053   LearningRate 0.0010   Epoch: 18   Global Step: 224100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:57:37,472-Speed 3036.56 samples/sec   Loss 1.0313   LearningRate 0.0010   Epoch: 18   Global Step: 224110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:57:40,816-Speed 3063.20 samples/sec   Loss 1.0374   LearningRate 0.0010   Epoch: 18   Global Step: 224120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:57:44,246-Speed 2986.36 samples/sec   Loss 1.0414   LearningRate 0.0010   Epoch: 18   Global Step: 224130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:57:47,639-Speed 3019.02 samples/sec   Loss 1.0443   LearningRate 0.0010   Epoch: 18   Global Step: 224140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:57:51,054-Speed 2999.27 samples/sec   Loss 1.0771   LearningRate 0.0010   Epoch: 18   Global Step: 224150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:57:54,460-Speed 3007.66 samples/sec   Loss 1.0642   LearningRate 0.0010   Epoch: 18   Global Step: 224160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:57:57,806-Speed 3060.95 samples/sec   Loss 1.0198   LearningRate 0.0010   Epoch: 18   Global Step: 224170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:58:01,121-Speed 3089.77 samples/sec   Loss 1.0437   LearningRate 0.0010   Epoch: 18   Global Step: 224180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:58:04,474-Speed 3054.59 samples/sec   Loss 1.0220   LearningRate 0.0010   Epoch: 18   Global Step: 224190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:58:07,848-Speed 3035.79 samples/sec   Loss 1.0932   LearningRate 0.0010   Epoch: 18   Global Step: 224200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:58:11,192-Speed 3063.60 samples/sec   Loss 1.0565   LearningRate 0.0009   Epoch: 18   Global Step: 224210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:58:14,609-Speed 2996.87 samples/sec   Loss 1.0389   LearningRate 0.0009   Epoch: 18   Global Step: 224220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:58:17,928-Speed 3086.90 samples/sec   Loss 1.0880   LearningRate 0.0009   Epoch: 18   Global Step: 224230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:58:21,243-Speed 3090.10 samples/sec   Loss 1.0696   LearningRate 0.0009   Epoch: 18   Global Step: 224240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:58:24,644-Speed 3011.67 samples/sec   Loss 1.1020   LearningRate 0.0009   Epoch: 18   Global Step: 224250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:58:28,088-Speed 2973.73 samples/sec   Loss 1.0613   LearningRate 0.0009   Epoch: 18   Global Step: 224260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:58:31,487-Speed 3013.90 samples/sec   Loss 1.0632   LearningRate 0.0009   Epoch: 18   Global Step: 224270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:58:34,872-Speed 3026.05 samples/sec   Loss 1.0607   LearningRate 0.0009   Epoch: 18   Global Step: 224280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:58:38,375-Speed 2924.13 samples/sec   Loss 1.0399   LearningRate 0.0009   Epoch: 18   Global Step: 224290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:58:41,807-Speed 2985.39 samples/sec   Loss 1.0610   LearningRate 0.0009   Epoch: 18   Global Step: 224300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:58:45,237-Speed 2985.29 samples/sec   Loss 1.0669   LearningRate 0.0009   Epoch: 18   Global Step: 224310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:58:48,614-Speed 3033.51 samples/sec   Loss 1.0453   LearningRate 0.0009   Epoch: 18   Global Step: 224320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:58:52,074-Speed 2960.85 samples/sec   Loss 1.0970   LearningRate 0.0009   Epoch: 18   Global Step: 224330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:58:55,499-Speed 2990.65 samples/sec   Loss 1.0416   LearningRate 0.0009   Epoch: 18   Global Step: 224340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:58:59,009-Speed 2918.46 samples/sec   Loss 1.0462   LearningRate 0.0009   Epoch: 18   Global Step: 224350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:59:02,326-Speed 3087.52 samples/sec   Loss 1.0550   LearningRate 0.0009   Epoch: 18   Global Step: 224360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:59:05,732-Speed 3007.63 samples/sec   Loss 1.0395   LearningRate 0.0009   Epoch: 18   Global Step: 224370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:59:09,095-Speed 3045.34 samples/sec   Loss 1.0092   LearningRate 0.0009   Epoch: 18   Global Step: 224380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:59:12,458-Speed 3046.27 samples/sec   Loss 1.0712   LearningRate 0.0009   Epoch: 18   Global Step: 224390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:59:15,840-Speed 3029.00 samples/sec   Loss 1.0564   LearningRate 0.0009   Epoch: 18   Global Step: 224400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 22:59:19,166-Speed 3079.94 samples/sec   Loss 1.0697   LearningRate 0.0009   Epoch: 18   Global Step: 224410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:59:22,528-Speed 3046.90 samples/sec   Loss 1.0435   LearningRate 0.0009   Epoch: 18   Global Step: 224420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:59:25,971-Speed 2975.04 samples/sec   Loss 1.0805   LearningRate 0.0009   Epoch: 18   Global Step: 224430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:59:29,318-Speed 3060.19 samples/sec   Loss 1.0342   LearningRate 0.0009   Epoch: 18   Global Step: 224440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:59:32,722-Speed 3008.28 samples/sec   Loss 1.0400   LearningRate 0.0009   Epoch: 18   Global Step: 224450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:59:36,064-Speed 3065.76 samples/sec   Loss 1.0393   LearningRate 0.0009   Epoch: 18   Global Step: 224460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:59:39,458-Speed 3017.65 samples/sec   Loss 1.0962   LearningRate 0.0009   Epoch: 18   Global Step: 224470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:59:42,830-Speed 3037.40 samples/sec   Loss 1.0502   LearningRate 0.0009   Epoch: 18   Global Step: 224480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:59:46,373-Speed 2891.21 samples/sec   Loss 1.0267   LearningRate 0.0009   Epoch: 18   Global Step: 224490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:59:49,694-Speed 3084.21 samples/sec   Loss 1.0239   LearningRate 0.0009   Epoch: 18   Global Step: 224500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:59:53,099-Speed 3008.13 samples/sec   Loss 1.0760   LearningRate 0.0009   Epoch: 18   Global Step: 224510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 22:59:56,496-Speed 3015.46 samples/sec   Loss 1.0489   LearningRate 0.0009   Epoch: 18   Global Step: 224520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 22:59:59,894-Speed 3014.31 samples/sec   Loss 1.0908   LearningRate 0.0009   Epoch: 18   Global Step: 224530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:00:03,275-Speed 3030.50 samples/sec   Loss 1.0401   LearningRate 0.0009   Epoch: 18   Global Step: 224540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:00:06,639-Speed 3044.03 samples/sec   Loss 1.0684   LearningRate 0.0009   Epoch: 18   Global Step: 224550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:00:10,112-Speed 2949.73 samples/sec   Loss 1.0968   LearningRate 0.0009   Epoch: 18   Global Step: 224560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:00:13,608-Speed 2929.48 samples/sec   Loss 1.0736   LearningRate 0.0009   Epoch: 18   Global Step: 224570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:00:17,049-Speed 2977.02 samples/sec   Loss 1.0543   LearningRate 0.0009   Epoch: 18   Global Step: 224580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:00:20,491-Speed 2975.65 samples/sec   Loss 1.0807   LearningRate 0.0009   Epoch: 18   Global Step: 224590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:00:23,964-Speed 2949.40 samples/sec   Loss 1.0633   LearningRate 0.0009   Epoch: 18   Global Step: 224600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:00:27,400-Speed 2981.00 samples/sec   Loss 1.0537   LearningRate 0.0009   Epoch: 18   Global Step: 224610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:00:30,787-Speed 3024.43 samples/sec   Loss 1.0330   LearningRate 0.0009   Epoch: 18   Global Step: 224620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:00:34,112-Speed 3081.00 samples/sec   Loss 1.0786   LearningRate 0.0009   Epoch: 18   Global Step: 224630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:00:37,466-Speed 3053.36 samples/sec   Loss 1.0901   LearningRate 0.0009   Epoch: 18   Global Step: 224640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:00:40,872-Speed 3007.70 samples/sec   Loss 1.0859   LearningRate 0.0009   Epoch: 18   Global Step: 224650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:00:44,257-Speed 3025.12 samples/sec   Loss 1.0449   LearningRate 0.0009   Epoch: 18   Global Step: 224660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:00:47,584-Speed 3079.36 samples/sec   Loss 1.0803   LearningRate 0.0009   Epoch: 18   Global Step: 224670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:00:50,917-Speed 3073.11 samples/sec   Loss 1.0984   LearningRate 0.0009   Epoch: 18   Global Step: 224680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:00:54,217-Speed 3103.55 samples/sec   Loss 1.0798   LearningRate 0.0009   Epoch: 18   Global Step: 224690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:00:57,622-Speed 3008.49 samples/sec   Loss 1.0639   LearningRate 0.0009   Epoch: 18   Global Step: 224700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:01:01,027-Speed 3008.49 samples/sec   Loss 1.0822   LearningRate 0.0009   Epoch: 18   Global Step: 224710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:01:04,395-Speed 3040.41 samples/sec   Loss 1.0927   LearningRate 0.0009   Epoch: 18   Global Step: 224720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:01:07,766-Speed 3038.38 samples/sec   Loss 1.0659   LearningRate 0.0009   Epoch: 18   Global Step: 224730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:01:11,094-Speed 3078.02 samples/sec   Loss 1.0474   LearningRate 0.0009   Epoch: 18   Global Step: 224740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:01:14,490-Speed 3016.09 samples/sec   Loss 1.0149   LearningRate 0.0009   Epoch: 18   Global Step: 224750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:01:17,835-Speed 3062.37 samples/sec   Loss 1.0754   LearningRate 0.0009   Epoch: 18   Global Step: 224760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:01:21,160-Speed 3080.40 samples/sec   Loss 1.0701   LearningRate 0.0009   Epoch: 18   Global Step: 224770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:01:24,467-Speed 3098.04 samples/sec   Loss 1.0262   LearningRate 0.0009   Epoch: 18   Global Step: 224780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:01:27,854-Speed 3024.01 samples/sec   Loss 1.0933   LearningRate 0.0009   Epoch: 18   Global Step: 224790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:01:31,227-Speed 3036.49 samples/sec   Loss 1.0312   LearningRate 0.0009   Epoch: 18   Global Step: 224800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:01:34,570-Speed 3064.48 samples/sec   Loss 1.0683   LearningRate 0.0009   Epoch: 18   Global Step: 224810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:01:37,919-Speed 3058.12 samples/sec   Loss 1.0773   LearningRate 0.0009   Epoch: 18   Global Step: 224820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:01:41,255-Speed 3070.18 samples/sec   Loss 1.0900   LearningRate 0.0009   Epoch: 18   Global Step: 224830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:01:44,592-Speed 3070.07 samples/sec   Loss 1.1083   LearningRate 0.0009   Epoch: 18   Global Step: 224840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:01:47,945-Speed 3054.93 samples/sec   Loss 1.0758   LearningRate 0.0009   Epoch: 18   Global Step: 224850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:01:51,301-Speed 3052.03 samples/sec   Loss 1.0477   LearningRate 0.0009   Epoch: 18   Global Step: 224860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:01:54,654-Speed 3054.62 samples/sec   Loss 1.1191   LearningRate 0.0009   Epoch: 18   Global Step: 224870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:01:58,061-Speed 3006.44 samples/sec   Loss 1.0911   LearningRate 0.0009   Epoch: 18   Global Step: 224880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:01,426-Speed 3043.72 samples/sec   Loss 1.0875   LearningRate 0.0009   Epoch: 18   Global Step: 224890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:04,820-Speed 3018.06 samples/sec   Loss 1.0529   LearningRate 0.0009   Epoch: 18   Global Step: 224900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:08,146-Speed 3079.82 samples/sec   Loss 1.0575   LearningRate 0.0009   Epoch: 18   Global Step: 224910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:11,509-Speed 3045.85 samples/sec   Loss 0.9983   LearningRate 0.0009   Epoch: 18   Global Step: 224920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:14,896-Speed 3023.89 samples/sec   Loss 1.0634   LearningRate 0.0009   Epoch: 18   Global Step: 224930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:18,233-Speed 3069.72 samples/sec   Loss 1.0654   LearningRate 0.0009   Epoch: 18   Global Step: 224940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:02:21,629-Speed 3016.25 samples/sec   Loss 1.1057   LearningRate 0.0009   Epoch: 18   Global Step: 224950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:02:25,067-Speed 2979.07 samples/sec   Loss 1.1036   LearningRate 0.0009   Epoch: 18   Global Step: 224960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:02:28,453-Speed 3024.99 samples/sec   Loss 1.0666   LearningRate 0.0009   Epoch: 18   Global Step: 224970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:31,816-Speed 3046.39 samples/sec   Loss 1.0423   LearningRate 0.0009   Epoch: 18   Global Step: 224980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:35,121-Speed 3099.36 samples/sec   Loss 1.0474   LearningRate 0.0009   Epoch: 18   Global Step: 224990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:38,459-Speed 3068.05 samples/sec   Loss 1.0582   LearningRate 0.0009   Epoch: 18   Global Step: 225000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:41,855-Speed 3016.46 samples/sec   Loss 1.0699   LearningRate 0.0009   Epoch: 18   Global Step: 225010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:45,189-Speed 3072.29 samples/sec   Loss 1.0413   LearningRate 0.0009   Epoch: 18   Global Step: 225020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:48,671-Speed 2941.44 samples/sec   Loss 1.0010   LearningRate 0.0009   Epoch: 18   Global Step: 225030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:52,029-Speed 3050.75 samples/sec   Loss 1.0957   LearningRate 0.0009   Epoch: 18   Global Step: 225040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:55,410-Speed 3028.90 samples/sec   Loss 1.0648   LearningRate 0.0009   Epoch: 18   Global Step: 225050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:02:58,831-Speed 2994.26 samples/sec   Loss 1.0542   LearningRate 0.0009   Epoch: 18   Global Step: 225060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:03:02,240-Speed 3004.78 samples/sec   Loss 1.1048   LearningRate 0.0009   Epoch: 18   Global Step: 225070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:05,699-Speed 2961.20 samples/sec   Loss 1.0672   LearningRate 0.0009   Epoch: 18   Global Step: 225080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:09,170-Speed 2950.49 samples/sec   Loss 0.9809   LearningRate 0.0009   Epoch: 18   Global Step: 225090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:12,583-Speed 3000.88 samples/sec   Loss 1.0907   LearningRate 0.0009   Epoch: 18   Global Step: 225100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:15,928-Speed 3062.33 samples/sec   Loss 1.0410   LearningRate 0.0009   Epoch: 18   Global Step: 225110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:19,293-Speed 3044.66 samples/sec   Loss 1.0524   LearningRate 0.0009   Epoch: 18   Global Step: 225120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:22,643-Speed 3056.96 samples/sec   Loss 1.0710   LearningRate 0.0009   Epoch: 18   Global Step: 225130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:26,043-Speed 3012.92 samples/sec   Loss 1.0939   LearningRate 0.0009   Epoch: 18   Global Step: 225140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:29,426-Speed 3028.15 samples/sec   Loss 1.1032   LearningRate 0.0009   Epoch: 18   Global Step: 225150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:32,777-Speed 3056.24 samples/sec   Loss 1.0506   LearningRate 0.0009   Epoch: 18   Global Step: 225160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:36,142-Speed 3044.40 samples/sec   Loss 1.1030   LearningRate 0.0009   Epoch: 18   Global Step: 225170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:39,605-Speed 2957.64 samples/sec   Loss 1.0372   LearningRate 0.0009   Epoch: 18   Global Step: 225180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:43,080-Speed 2947.37 samples/sec   Loss 1.0374   LearningRate 0.0009   Epoch: 18   Global Step: 225190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:46,470-Speed 3021.68 samples/sec   Loss 1.0321   LearningRate 0.0009   Epoch: 18   Global Step: 225200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:49,808-Speed 3068.59 samples/sec   Loss 1.0369   LearningRate 0.0009   Epoch: 18   Global Step: 225210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:53,194-Speed 3025.58 samples/sec   Loss 1.0367   LearningRate 0.0009   Epoch: 18   Global Step: 225220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:56,529-Speed 3071.37 samples/sec   Loss 1.0949   LearningRate 0.0009   Epoch: 18   Global Step: 225230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:03:59,938-Speed 3004.99 samples/sec   Loss 1.0721   LearningRate 0.0009   Epoch: 18   Global Step: 225240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:04:03,323-Speed 3026.01 samples/sec   Loss 1.1274   LearningRate 0.0009   Epoch: 18   Global Step: 225250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:04:06,721-Speed 3013.55 samples/sec   Loss 1.1104   LearningRate 0.0009   Epoch: 18   Global Step: 225260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:04:10,131-Speed 3004.82 samples/sec   Loss 1.0695   LearningRate 0.0009   Epoch: 18   Global Step: 225270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:04:13,530-Speed 3014.18 samples/sec   Loss 1.0500   LearningRate 0.0009   Epoch: 18   Global Step: 225280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:04:16,895-Speed 3043.74 samples/sec   Loss 1.0781   LearningRate 0.0009   Epoch: 18   Global Step: 225290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:04:20,277-Speed 3028.63 samples/sec   Loss 1.0285   LearningRate 0.0009   Epoch: 18   Global Step: 225300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:04:23,714-Speed 2980.41 samples/sec   Loss 1.0662   LearningRate 0.0009   Epoch: 18   Global Step: 225310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:04:27,153-Speed 2978.63 samples/sec   Loss 1.0202   LearningRate 0.0009   Epoch: 18   Global Step: 225320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:04:30,614-Speed 2959.37 samples/sec   Loss 1.0588   LearningRate 0.0009   Epoch: 18   Global Step: 225330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:04:34,086-Speed 2949.95 samples/sec   Loss 1.0812   LearningRate 0.0009   Epoch: 18   Global Step: 225340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:04:37,507-Speed 2994.41 samples/sec   Loss 1.0952   LearningRate 0.0009   Epoch: 18   Global Step: 225350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:04:40,915-Speed 3004.82 samples/sec   Loss 1.0713   LearningRate 0.0009   Epoch: 18   Global Step: 225360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:04:44,331-Speed 2998.66 samples/sec   Loss 1.0931   LearningRate 0.0009   Epoch: 18   Global Step: 225370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:04:47,798-Speed 2954.32 samples/sec   Loss 1.0877   LearningRate 0.0009   Epoch: 18   Global Step: 225380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:04:51,234-Speed 2981.36 samples/sec   Loss 1.0684   LearningRate 0.0009   Epoch: 18   Global Step: 225390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:04:54,548-Speed 3090.34 samples/sec   Loss 1.0900   LearningRate 0.0009   Epoch: 18   Global Step: 225400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:04:57,892-Speed 3062.65 samples/sec   Loss 1.1153   LearningRate 0.0009   Epoch: 18   Global Step: 225410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:01,275-Speed 3028.06 samples/sec   Loss 1.0998   LearningRate 0.0009   Epoch: 18   Global Step: 225420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:04,697-Speed 2993.58 samples/sec   Loss 1.1168   LearningRate 0.0009   Epoch: 18   Global Step: 225430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:08,147-Speed 2969.32 samples/sec   Loss 1.1349   LearningRate 0.0009   Epoch: 18   Global Step: 225440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:11,556-Speed 3004.57 samples/sec   Loss 1.0776   LearningRate 0.0009   Epoch: 18   Global Step: 225450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:14,923-Speed 3042.37 samples/sec   Loss 1.1148   LearningRate 0.0009   Epoch: 18   Global Step: 225460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:18,324-Speed 3012.46 samples/sec   Loss 1.0676   LearningRate 0.0009   Epoch: 18   Global Step: 225470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:21,734-Speed 3003.38 samples/sec   Loss 1.0955   LearningRate 0.0009   Epoch: 18   Global Step: 225480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:25,148-Speed 3000.54 samples/sec   Loss 1.0955   LearningRate 0.0009   Epoch: 18   Global Step: 225490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:28,536-Speed 3023.85 samples/sec   Loss 1.1053   LearningRate 0.0009   Epoch: 18   Global Step: 225500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:05:31,933-Speed 3015.59 samples/sec   Loss 1.0724   LearningRate 0.0009   Epoch: 18   Global Step: 225510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:35,342-Speed 3004.56 samples/sec   Loss 1.0971   LearningRate 0.0008   Epoch: 18   Global Step: 225520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:38,736-Speed 3018.30 samples/sec   Loss 1.0482   LearningRate 0.0008   Epoch: 18   Global Step: 225530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:42,230-Speed 2931.12 samples/sec   Loss 1.0164   LearningRate 0.0008   Epoch: 18   Global Step: 225540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:45,660-Speed 2986.33 samples/sec   Loss 1.0613   LearningRate 0.0008   Epoch: 18   Global Step: 225550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:49,096-Speed 2981.53 samples/sec   Loss 1.1207   LearningRate 0.0008   Epoch: 18   Global Step: 225560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:52,544-Speed 2970.60 samples/sec   Loss 1.0726   LearningRate 0.0008   Epoch: 18   Global Step: 225570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:05:55,953-Speed 3004.08 samples/sec   Loss 1.0987   LearningRate 0.0008   Epoch: 18   Global Step: 225580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:05:59,338-Speed 3026.59 samples/sec   Loss 1.0906   LearningRate 0.0008   Epoch: 18   Global Step: 225590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:06:02,761-Speed 2992.55 samples/sec   Loss 1.1350   LearningRate 0.0008   Epoch: 18   Global Step: 225600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:06:06,170-Speed 3004.84 samples/sec   Loss 1.1172   LearningRate 0.0008   Epoch: 18   Global Step: 225610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:06:09,541-Speed 3038.84 samples/sec   Loss 1.0558   LearningRate 0.0008   Epoch: 18   Global Step: 225620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:06:12,944-Speed 3010.09 samples/sec   Loss 1.1290   LearningRate 0.0008   Epoch: 18   Global Step: 225630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:06:16,248-Speed 3100.07 samples/sec   Loss 1.0816   LearningRate 0.0008   Epoch: 18   Global Step: 225640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:06:19,638-Speed 3022.04 samples/sec   Loss 1.0687   LearningRate 0.0008   Epoch: 18   Global Step: 225650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:06:23,049-Speed 3002.31 samples/sec   Loss 1.0875   LearningRate 0.0008   Epoch: 18   Global Step: 225660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:06:26,381-Speed 3074.48 samples/sec   Loss 1.0522   LearningRate 0.0008   Epoch: 18   Global Step: 225670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:06:29,798-Speed 2997.62 samples/sec   Loss 1.1107   LearningRate 0.0008   Epoch: 18   Global Step: 225680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:06:33,225-Speed 2989.68 samples/sec   Loss 1.0856   LearningRate 0.0008   Epoch: 18   Global Step: 225690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:06:36,665-Speed 2977.03 samples/sec   Loss 1.0709   LearningRate 0.0008   Epoch: 18   Global Step: 225700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:06:40,065-Speed 3013.50 samples/sec   Loss 1.0949   LearningRate 0.0008   Epoch: 18   Global Step: 225710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:06:43,454-Speed 3021.76 samples/sec   Loss 1.1076   LearningRate 0.0008   Epoch: 18   Global Step: 225720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:06:46,802-Speed 3059.57 samples/sec   Loss 1.0435   LearningRate 0.0008   Epoch: 18   Global Step: 225730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:06:50,246-Speed 2974.57 samples/sec   Loss 1.0693   LearningRate 0.0008   Epoch: 18   Global Step: 225740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:06:53,691-Speed 2972.97 samples/sec   Loss 1.0956   LearningRate 0.0008   Epoch: 18   Global Step: 225750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:06:57,120-Speed 2986.98 samples/sec   Loss 1.0488   LearningRate 0.0008   Epoch: 18   Global Step: 225760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:07:00,474-Speed 3057.76 samples/sec   Loss 1.0641   LearningRate 0.0008   Epoch: 18   Global Step: 225770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:07:03,808-Speed 3072.87 samples/sec   Loss 1.0469   LearningRate 0.0008   Epoch: 18   Global Step: 225780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:07:07,192-Speed 3027.13 samples/sec   Loss 1.0869   LearningRate 0.0008   Epoch: 18   Global Step: 225790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:07:10,555-Speed 3045.38 samples/sec   Loss 1.0968   LearningRate 0.0008   Epoch: 18   Global Step: 225800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:07:13,972-Speed 2998.45 samples/sec   Loss 1.0450   LearningRate 0.0008   Epoch: 18   Global Step: 225810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:07:17,315-Speed 3064.14 samples/sec   Loss 1.0794   LearningRate 0.0008   Epoch: 18   Global Step: 225820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:07:20,620-Speed 3099.52 samples/sec   Loss 1.0821   LearningRate 0.0008   Epoch: 18   Global Step: 225830   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:07:24,007-Speed 3024.65 samples/sec   Loss 1.0985   LearningRate 0.0008   Epoch: 18   Global Step: 225840   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:07:27,376-Speed 3040.62 samples/sec   Loss 1.0769   LearningRate 0.0008   Epoch: 18   Global Step: 225850   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:07:30,716-Speed 3066.25 samples/sec   Loss 1.0464   LearningRate 0.0008   Epoch: 18   Global Step: 225860   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:07:34,174-Speed 2961.81 samples/sec   Loss 1.0820   LearningRate 0.0008   Epoch: 18   Global Step: 225870   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:07:37,520-Speed 3062.10 samples/sec   Loss 1.0861   LearningRate 0.0008   Epoch: 18   Global Step: 225880   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:07:40,850-Speed 3076.43 samples/sec   Loss 1.0690   LearningRate 0.0008   Epoch: 18   Global Step: 225890   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:07:44,235-Speed 3025.46 samples/sec   Loss 1.0304   LearningRate 0.0008   Epoch: 18   Global Step: 225900   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:07:47,639-Speed 3009.18 samples/sec   Loss 1.0289   LearningRate 0.0008   Epoch: 18   Global Step: 225910   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:07:51,021-Speed 3028.71 samples/sec   Loss 1.0839   LearningRate 0.0008   Epoch: 18   Global Step: 225920   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:07:54,373-Speed 3056.03 samples/sec   Loss 1.1152   LearningRate 0.0008   Epoch: 18   Global Step: 225930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:07:57,771-Speed 3015.18 samples/sec   Loss 1.0510   LearningRate 0.0008   Epoch: 18   Global Step: 225940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:08:01,209-Speed 2979.54 samples/sec   Loss 1.0619   LearningRate 0.0008   Epoch: 18   Global Step: 225950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:08:04,576-Speed 3042.30 samples/sec   Loss 1.0826   LearningRate 0.0008   Epoch: 18   Global Step: 225960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:08:07,946-Speed 3039.01 samples/sec   Loss 1.0416   LearningRate 0.0008   Epoch: 18   Global Step: 225970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:08:11,319-Speed 3036.44 samples/sec   Loss 1.1075   LearningRate 0.0008   Epoch: 18   Global Step: 225980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:08:14,726-Speed 3006.92 samples/sec   Loss 1.1235   LearningRate 0.0008   Epoch: 18   Global Step: 225990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:08:18,141-Speed 2998.90 samples/sec   Loss 1.0633   LearningRate 0.0008   Epoch: 18   Global Step: 226000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:08:21,571-Speed 2987.05 samples/sec   Loss 1.0940   LearningRate 0.0008   Epoch: 18   Global Step: 226010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:08:25,002-Speed 2984.98 samples/sec   Loss 1.0685   LearningRate 0.0008   Epoch: 18   Global Step: 226020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:08:28,381-Speed 3031.72 samples/sec   Loss 1.0933   LearningRate 0.0008   Epoch: 18   Global Step: 226030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:08:31,870-Speed 2935.92 samples/sec   Loss 1.0960   LearningRate 0.0008   Epoch: 18   Global Step: 226040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:08:35,235-Speed 3043.38 samples/sec   Loss 1.0466   LearningRate 0.0008   Epoch: 18   Global Step: 226050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:08:38,629-Speed 3018.37 samples/sec   Loss 1.1115   LearningRate 0.0008   Epoch: 18   Global Step: 226060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:08:41,991-Speed 3046.30 samples/sec   Loss 1.0667   LearningRate 0.0008   Epoch: 18   Global Step: 226070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:08:45,331-Speed 3067.36 samples/sec   Loss 1.0966   LearningRate 0.0008   Epoch: 18   Global Step: 226080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:08:48,824-Speed 2932.26 samples/sec   Loss 1.0951   LearningRate 0.0008   Epoch: 18   Global Step: 226090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:08:52,308-Speed 2940.10 samples/sec   Loss 1.1161   LearningRate 0.0008   Epoch: 18   Global Step: 226100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:08:55,762-Speed 2965.23 samples/sec   Loss 1.1029   LearningRate 0.0008   Epoch: 18   Global Step: 226110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:08:59,131-Speed 3040.13 samples/sec   Loss 1.0545   LearningRate 0.0008   Epoch: 18   Global Step: 226120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:09:02,558-Speed 2989.67 samples/sec   Loss 1.1308   LearningRate 0.0008   Epoch: 18   Global Step: 226130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:09:05,951-Speed 3018.39 samples/sec   Loss 1.0542   LearningRate 0.0008   Epoch: 18   Global Step: 226140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:09:09,337-Speed 3025.11 samples/sec   Loss 1.0861   LearningRate 0.0008   Epoch: 18   Global Step: 226150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:09:12,767-Speed 2986.96 samples/sec   Loss 1.0780   LearningRate 0.0008   Epoch: 18   Global Step: 226160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:09:16,097-Speed 3075.83 samples/sec   Loss 1.1034   LearningRate 0.0008   Epoch: 18   Global Step: 226170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:09:19,482-Speed 3026.55 samples/sec   Loss 1.0785   LearningRate 0.0008   Epoch: 18   Global Step: 226180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:09:22,912-Speed 2986.49 samples/sec   Loss 1.0893   LearningRate 0.0008   Epoch: 18   Global Step: 226190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:09:26,337-Speed 2990.89 samples/sec   Loss 1.0984   LearningRate 0.0008   Epoch: 18   Global Step: 226200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:09:29,744-Speed 3006.54 samples/sec   Loss 1.0741   LearningRate 0.0008   Epoch: 18   Global Step: 226210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:09:33,097-Speed 3054.81 samples/sec   Loss 1.0750   LearningRate 0.0008   Epoch: 18   Global Step: 226220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:09:36,452-Speed 3052.74 samples/sec   Loss 1.1096   LearningRate 0.0008   Epoch: 18   Global Step: 226230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:09:39,823-Speed 3038.70 samples/sec   Loss 1.0785   LearningRate 0.0008   Epoch: 18   Global Step: 226240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:09:43,234-Speed 3003.72 samples/sec   Loss 1.0625   LearningRate 0.0008   Epoch: 18   Global Step: 226250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:09:46,611-Speed 3032.58 samples/sec   Loss 1.0963   LearningRate 0.0008   Epoch: 18   Global Step: 226260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:09:49,962-Speed 3056.53 samples/sec   Loss 1.1098   LearningRate 0.0008   Epoch: 18   Global Step: 226270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:09:53,352-Speed 3021.49 samples/sec   Loss 1.0910   LearningRate 0.0008   Epoch: 18   Global Step: 226280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:09:56,724-Speed 3037.90 samples/sec   Loss 1.0642   LearningRate 0.0008   Epoch: 18   Global Step: 226290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:10:00,158-Speed 2982.67 samples/sec   Loss 1.0994   LearningRate 0.0008   Epoch: 18   Global Step: 226300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:10:03,528-Speed 3039.80 samples/sec   Loss 1.0569   LearningRate 0.0008   Epoch: 18   Global Step: 226310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:10:06,848-Speed 3084.25 samples/sec   Loss 1.0585   LearningRate 0.0008   Epoch: 18   Global Step: 226320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:10:10,175-Speed 3079.50 samples/sec   Loss 1.1120   LearningRate 0.0008   Epoch: 18   Global Step: 226330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:10:13,601-Speed 2989.95 samples/sec   Loss 1.1084   LearningRate 0.0008   Epoch: 18   Global Step: 226340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:10:17,002-Speed 3011.90 samples/sec   Loss 1.1350   LearningRate 0.0008   Epoch: 18   Global Step: 226350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:10:20,374-Speed 3037.22 samples/sec   Loss 1.0924   LearningRate 0.0008   Epoch: 18   Global Step: 226360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:10:23,717-Speed 3064.26 samples/sec   Loss 1.0926   LearningRate 0.0008   Epoch: 18   Global Step: 226370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:10:27,122-Speed 3007.58 samples/sec   Loss 1.0853   LearningRate 0.0008   Epoch: 18   Global Step: 226380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:10:30,497-Speed 3036.02 samples/sec   Loss 1.1006   LearningRate 0.0008   Epoch: 18   Global Step: 226390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:10:33,845-Speed 3058.55 samples/sec   Loss 1.1319   LearningRate 0.0008   Epoch: 18   Global Step: 226400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:10:37,276-Speed 2985.96 samples/sec   Loss 1.0778   LearningRate 0.0008   Epoch: 18   Global Step: 226410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:10:40,685-Speed 3004.72 samples/sec   Loss 1.0920   LearningRate 0.0008   Epoch: 18   Global Step: 226420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:10:44,004-Speed 3086.74 samples/sec   Loss 1.1013   LearningRate 0.0008   Epoch: 18   Global Step: 226430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:10:47,387-Speed 3027.11 samples/sec   Loss 1.0768   LearningRate 0.0008   Epoch: 18   Global Step: 226440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:10:50,783-Speed 3016.39 samples/sec   Loss 1.1146   LearningRate 0.0008   Epoch: 18   Global Step: 226450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:10:54,113-Speed 3075.42 samples/sec   Loss 1.1195   LearningRate 0.0008   Epoch: 18   Global Step: 226460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:10:57,495-Speed 3029.02 samples/sec   Loss 1.0885   LearningRate 0.0008   Epoch: 18   Global Step: 226470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:11:00,947-Speed 2967.07 samples/sec   Loss 1.1325   LearningRate 0.0008   Epoch: 18   Global Step: 226480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:11:04,320-Speed 3036.66 samples/sec   Loss 1.1211   LearningRate 0.0008   Epoch: 18   Global Step: 226490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:11:07,684-Speed 3044.92 samples/sec   Loss 1.1050   LearningRate 0.0008   Epoch: 18   Global Step: 226500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:11:11,051-Speed 3041.98 samples/sec   Loss 1.0907   LearningRate 0.0008   Epoch: 18   Global Step: 226510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:11:14,416-Speed 3044.74 samples/sec   Loss 1.0936   LearningRate 0.0008   Epoch: 18   Global Step: 226520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:11:17,842-Speed 2989.31 samples/sec   Loss 1.1096   LearningRate 0.0008   Epoch: 18   Global Step: 226530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:11:21,268-Speed 2989.34 samples/sec   Loss 1.0340   LearningRate 0.0008   Epoch: 18   Global Step: 226540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:11:24,713-Speed 2973.21 samples/sec   Loss 1.0652   LearningRate 0.0008   Epoch: 18   Global Step: 226550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:11:28,089-Speed 3034.16 samples/sec   Loss 1.0705   LearningRate 0.0008   Epoch: 18   Global Step: 226560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:11:31,476-Speed 3024.16 samples/sec   Loss 1.1251   LearningRate 0.0008   Epoch: 18   Global Step: 226570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:11:34,771-Speed 3108.82 samples/sec   Loss 1.1252   LearningRate 0.0008   Epoch: 18   Global Step: 226580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:11:38,215-Speed 2974.05 samples/sec   Loss 1.0957   LearningRate 0.0008   Epoch: 18   Global Step: 226590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:11:41,690-Speed 2947.09 samples/sec   Loss 1.1129   LearningRate 0.0008   Epoch: 18   Global Step: 226600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:11:45,079-Speed 3022.54 samples/sec   Loss 1.0776   LearningRate 0.0008   Epoch: 18   Global Step: 226610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:11:48,534-Speed 2964.90 samples/sec   Loss 1.1298   LearningRate 0.0008   Epoch: 18   Global Step: 226620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:11:51,879-Speed 3062.30 samples/sec   Loss 1.0855   LearningRate 0.0008   Epoch: 18   Global Step: 226630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:11:55,229-Speed 3057.46 samples/sec   Loss 1.1374   LearningRate 0.0008   Epoch: 18   Global Step: 226640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:11:58,612-Speed 3027.66 samples/sec   Loss 1.0443   LearningRate 0.0008   Epoch: 18   Global Step: 226650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:12:02,031-Speed 2995.67 samples/sec   Loss 1.0945   LearningRate 0.0008   Epoch: 18   Global Step: 226660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:12:05,399-Speed 3041.57 samples/sec   Loss 1.1028   LearningRate 0.0008   Epoch: 18   Global Step: 226670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:12:08,731-Speed 3074.23 samples/sec   Loss 1.0879   LearningRate 0.0008   Epoch: 18   Global Step: 226680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:12:12,102-Speed 3038.38 samples/sec   Loss 1.0364   LearningRate 0.0008   Epoch: 18   Global Step: 226690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:12:15,472-Speed 3038.73 samples/sec   Loss 1.0722   LearningRate 0.0008   Epoch: 18   Global Step: 226700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:12:18,891-Speed 2995.90 samples/sec   Loss 1.1222   LearningRate 0.0008   Epoch: 18   Global Step: 226710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:12:22,370-Speed 2944.53 samples/sec   Loss 1.0972   LearningRate 0.0008   Epoch: 18   Global Step: 226720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:12:25,741-Speed 3038.13 samples/sec   Loss 1.1026   LearningRate 0.0008   Epoch: 18   Global Step: 226730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:12:29,096-Speed 3052.71 samples/sec   Loss 1.1317   LearningRate 0.0008   Epoch: 18   Global Step: 226740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:12:32,489-Speed 3019.29 samples/sec   Loss 1.0848   LearningRate 0.0008   Epoch: 18   Global Step: 226750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:12:35,908-Speed 2996.00 samples/sec   Loss 1.1140   LearningRate 0.0008   Epoch: 18   Global Step: 226760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:12:39,286-Speed 3032.05 samples/sec   Loss 1.0573   LearningRate 0.0008   Epoch: 18   Global Step: 226770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:12:42,661-Speed 3034.40 samples/sec   Loss 1.1497   LearningRate 0.0008   Epoch: 18   Global Step: 226780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:12:46,028-Speed 3042.29 samples/sec   Loss 1.1069   LearningRate 0.0008   Epoch: 18   Global Step: 226790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:12:49,362-Speed 3073.21 samples/sec   Loss 1.0661   LearningRate 0.0008   Epoch: 18   Global Step: 226800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:12:52,724-Speed 3046.67 samples/sec   Loss 1.0589   LearningRate 0.0008   Epoch: 18   Global Step: 226810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:12:56,114-Speed 3020.93 samples/sec   Loss 1.1158   LearningRate 0.0008   Epoch: 18   Global Step: 226820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:12:59,462-Speed 3059.61 samples/sec   Loss 1.0818   LearningRate 0.0008   Epoch: 18   Global Step: 226830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:13:02,807-Speed 3062.47 samples/sec   Loss 1.0805   LearningRate 0.0008   Epoch: 18   Global Step: 226840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:13:06,227-Speed 2994.45 samples/sec   Loss 1.0819   LearningRate 0.0008   Epoch: 18   Global Step: 226850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:13:09,662-Speed 2981.92 samples/sec   Loss 1.0889   LearningRate 0.0008   Epoch: 18   Global Step: 226860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:13:12,981-Speed 3087.05 samples/sec   Loss 1.0510   LearningRate 0.0008   Epoch: 18   Global Step: 226870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:13:16,342-Speed 3046.96 samples/sec   Loss 1.1229   LearningRate 0.0008   Epoch: 18   Global Step: 226880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:13:19,694-Speed 3056.45 samples/sec   Loss 1.0941   LearningRate 0.0008   Epoch: 18   Global Step: 226890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:13:23,013-Speed 3086.14 samples/sec   Loss 1.0805   LearningRate 0.0008   Epoch: 18   Global Step: 226900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:13:26,358-Speed 3062.57 samples/sec   Loss 1.0732   LearningRate 0.0007   Epoch: 18   Global Step: 226910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:13:29,689-Speed 3074.74 samples/sec   Loss 1.0779   LearningRate 0.0007   Epoch: 18   Global Step: 226920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:13:33,061-Speed 3038.02 samples/sec   Loss 1.0957   LearningRate 0.0007   Epoch: 18   Global Step: 226930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:13:36,393-Speed 3073.66 samples/sec   Loss 1.1389   LearningRate 0.0007   Epoch: 18   Global Step: 226940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:13:39,757-Speed 3045.22 samples/sec   Loss 1.1414   LearningRate 0.0007   Epoch: 18   Global Step: 226950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:13:43,104-Speed 3059.98 samples/sec   Loss 1.0775   LearningRate 0.0007   Epoch: 18   Global Step: 226960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:13:46,467-Speed 3046.25 samples/sec   Loss 1.1047   LearningRate 0.0007   Epoch: 18   Global Step: 226970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:13:49,883-Speed 2998.24 samples/sec   Loss 1.1084   LearningRate 0.0007   Epoch: 18   Global Step: 226980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:13:53,311-Speed 2988.31 samples/sec   Loss 1.0917   LearningRate 0.0007   Epoch: 18   Global Step: 226990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:13:56,819-Speed 2919.75 samples/sec   Loss 1.0920   LearningRate 0.0007   Epoch: 18   Global Step: 227000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:14:00,152-Speed 3073.72 samples/sec   Loss 1.0219   LearningRate 0.0007   Epoch: 18   Global Step: 227010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:14:03,523-Speed 3037.74 samples/sec   Loss 1.0760   LearningRate 0.0007   Epoch: 18   Global Step: 227020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:14:06,847-Speed 3081.83 samples/sec   Loss 1.1339   LearningRate 0.0007   Epoch: 18   Global Step: 227030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:14:10,198-Speed 3056.70 samples/sec   Loss 1.1078   LearningRate 0.0007   Epoch: 18   Global Step: 227040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:14:13,560-Speed 3046.28 samples/sec   Loss 1.1200   LearningRate 0.0007   Epoch: 18   Global Step: 227050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:14:16,904-Speed 3063.32 samples/sec   Loss 1.0603   LearningRate 0.0007   Epoch: 18   Global Step: 227060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:14:20,294-Speed 3020.90 samples/sec   Loss 1.0709   LearningRate 0.0007   Epoch: 18   Global Step: 227070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:14:23,690-Speed 3016.30 samples/sec   Loss 1.1086   LearningRate 0.0007   Epoch: 18   Global Step: 227080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:14:27,016-Speed 3079.77 samples/sec   Loss 1.0866   LearningRate 0.0007   Epoch: 18   Global Step: 227090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:14:30,439-Speed 2992.68 samples/sec   Loss 1.1070   LearningRate 0.0007   Epoch: 18   Global Step: 227100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:14:33,821-Speed 3028.29 samples/sec   Loss 1.1365   LearningRate 0.0007   Epoch: 18   Global Step: 227110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:14:37,189-Speed 3040.99 samples/sec   Loss 1.0601   LearningRate 0.0007   Epoch: 18   Global Step: 227120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:14:40,547-Speed 3050.37 samples/sec   Loss 1.1241   LearningRate 0.0007   Epoch: 18   Global Step: 227130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:14:43,890-Speed 3064.53 samples/sec   Loss 1.1100   LearningRate 0.0007   Epoch: 18   Global Step: 227140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:14:47,201-Speed 3093.33 samples/sec   Loss 1.1183   LearningRate 0.0007   Epoch: 18   Global Step: 227150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:14:50,527-Speed 3079.57 samples/sec   Loss 1.0298   LearningRate 0.0007   Epoch: 18   Global Step: 227160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:14:53,855-Speed 3078.44 samples/sec   Loss 1.1200   LearningRate 0.0007   Epoch: 18   Global Step: 227170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:14:57,206-Speed 3056.90 samples/sec   Loss 1.1168   LearningRate 0.0007   Epoch: 18   Global Step: 227180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:15:00,651-Speed 2972.63 samples/sec   Loss 1.0942   LearningRate 0.0007   Epoch: 18   Global Step: 227190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:15:04,012-Speed 3047.93 samples/sec   Loss 1.1368   LearningRate 0.0007   Epoch: 18   Global Step: 227200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:15:07,333-Speed 3084.54 samples/sec   Loss 1.0885   LearningRate 0.0007   Epoch: 18   Global Step: 227210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:15:10,713-Speed 3030.51 samples/sec   Loss 1.1181   LearningRate 0.0007   Epoch: 18   Global Step: 227220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:15:14,103-Speed 3022.15 samples/sec   Loss 1.1359   LearningRate 0.0007   Epoch: 18   Global Step: 227230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:15:17,461-Speed 3050.23 samples/sec   Loss 1.1225   LearningRate 0.0007   Epoch: 18   Global Step: 227240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:15:20,855-Speed 3018.16 samples/sec   Loss 1.1087   LearningRate 0.0007   Epoch: 18   Global Step: 227250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:15:24,244-Speed 3022.49 samples/sec   Loss 1.1090   LearningRate 0.0007   Epoch: 18   Global Step: 227260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:15:27,671-Speed 2988.29 samples/sec   Loss 1.0962   LearningRate 0.0007   Epoch: 18   Global Step: 227270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:15:31,045-Speed 3035.89 samples/sec   Loss 1.1070   LearningRate 0.0007   Epoch: 18   Global Step: 227280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:15:34,501-Speed 2963.78 samples/sec   Loss 1.1125   LearningRate 0.0007   Epoch: 18   Global Step: 227290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:15:37,950-Speed 2970.11 samples/sec   Loss 1.1517   LearningRate 0.0007   Epoch: 18   Global Step: 227300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:15:41,358-Speed 3004.95 samples/sec   Loss 1.1311   LearningRate 0.0007   Epoch: 18   Global Step: 227310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:15:44,837-Speed 2944.74 samples/sec   Loss 1.0959   LearningRate 0.0007   Epoch: 18   Global Step: 227320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:15:48,280-Speed 2974.57 samples/sec   Loss 1.0931   LearningRate 0.0007   Epoch: 18   Global Step: 227330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:15:51,714-Speed 2983.32 samples/sec   Loss 1.1201   LearningRate 0.0007   Epoch: 18   Global Step: 227340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:15:55,084-Speed 3040.00 samples/sec   Loss 1.1040   LearningRate 0.0007   Epoch: 18   Global Step: 227350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:15:58,558-Speed 2948.32 samples/sec   Loss 1.0952   LearningRate 0.0007   Epoch: 18   Global Step: 227360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:16:01,974-Speed 2998.35 samples/sec   Loss 1.1116   LearningRate 0.0007   Epoch: 18   Global Step: 227370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:16:05,415-Speed 2976.95 samples/sec   Loss 1.1071   LearningRate 0.0007   Epoch: 18   Global Step: 227380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:16:08,824-Speed 3004.56 samples/sec   Loss 1.1021   LearningRate 0.0007   Epoch: 18   Global Step: 227390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:16:12,265-Speed 2977.05 samples/sec   Loss 1.1405   LearningRate 0.0007   Epoch: 18   Global Step: 227400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:16:15,692-Speed 2988.49 samples/sec   Loss 1.0892   LearningRate 0.0007   Epoch: 18   Global Step: 227410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:16:19,036-Speed 3063.35 samples/sec   Loss 1.1317   LearningRate 0.0007   Epoch: 18   Global Step: 227420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:16:22,475-Speed 2978.66 samples/sec   Loss 1.0832   LearningRate 0.0007   Epoch: 18   Global Step: 227430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:16:25,844-Speed 3040.11 samples/sec   Loss 1.0876   LearningRate 0.0007   Epoch: 18   Global Step: 227440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:16:29,251-Speed 3007.56 samples/sec   Loss 1.1259   LearningRate 0.0007   Epoch: 18   Global Step: 227450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:16:32,661-Speed 3003.33 samples/sec   Loss 1.1633   LearningRate 0.0007   Epoch: 18   Global Step: 227460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:16:36,085-Speed 2991.29 samples/sec   Loss 1.1020   LearningRate 0.0007   Epoch: 18   Global Step: 227470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:16:39,485-Speed 3012.61 samples/sec   Loss 1.0629   LearningRate 0.0007   Epoch: 18   Global Step: 227480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:16:42,928-Speed 2975.25 samples/sec   Loss 1.1044   LearningRate 0.0007   Epoch: 18   Global Step: 227490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:16:46,344-Speed 2998.49 samples/sec   Loss 1.0958   LearningRate 0.0007   Epoch: 18   Global Step: 227500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:16:49,744-Speed 3012.03 samples/sec   Loss 1.1279   LearningRate 0.0007   Epoch: 18   Global Step: 227510   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:16:53,176-Speed 2985.07 samples/sec   Loss 1.1205   LearningRate 0.0007   Epoch: 18   Global Step: 227520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:16:56,569-Speed 3018.04 samples/sec   Loss 1.0605   LearningRate 0.0007   Epoch: 18   Global Step: 227530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:16:59,957-Speed 3023.77 samples/sec   Loss 1.0941   LearningRate 0.0007   Epoch: 18   Global Step: 227540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:17:03,459-Speed 2925.01 samples/sec   Loss 1.1051   LearningRate 0.0007   Epoch: 18   Global Step: 227550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:17:06,816-Speed 3051.34 samples/sec   Loss 1.0883   LearningRate 0.0007   Epoch: 18   Global Step: 227560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:17:10,202-Speed 3024.50 samples/sec   Loss 1.1239   LearningRate 0.0007   Epoch: 18   Global Step: 227570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:17:13,562-Speed 3048.82 samples/sec   Loss 1.0618   LearningRate 0.0007   Epoch: 18   Global Step: 227580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:17:16,977-Speed 2999.41 samples/sec   Loss 1.1112   LearningRate 0.0007   Epoch: 18   Global Step: 227590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:17:20,328-Speed 3057.22 samples/sec   Loss 1.0834   LearningRate 0.0007   Epoch: 18   Global Step: 227600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:17:23,710-Speed 3028.07 samples/sec   Loss 1.1335   LearningRate 0.0007   Epoch: 18   Global Step: 227610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:17:27,136-Speed 2989.98 samples/sec   Loss 1.0811   LearningRate 0.0007   Epoch: 18   Global Step: 227620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:17:30,499-Speed 3045.97 samples/sec   Loss 1.0659   LearningRate 0.0007   Epoch: 18   Global Step: 227630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:17:33,813-Speed 3090.86 samples/sec   Loss 1.1121   LearningRate 0.0007   Epoch: 18   Global Step: 227640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:17:37,125-Speed 3092.51 samples/sec   Loss 1.1448   LearningRate 0.0007   Epoch: 18   Global Step: 227650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:17:40,454-Speed 3077.05 samples/sec   Loss 1.1032   LearningRate 0.0007   Epoch: 18   Global Step: 227660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:17:43,804-Speed 3057.73 samples/sec   Loss 1.0837   LearningRate 0.0007   Epoch: 18   Global Step: 227670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:17:47,134-Speed 3075.64 samples/sec   Loss 1.0764   LearningRate 0.0007   Epoch: 18   Global Step: 227680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:17:50,532-Speed 3014.68 samples/sec   Loss 1.0992   LearningRate 0.0007   Epoch: 18   Global Step: 227690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:17:53,896-Speed 3044.74 samples/sec   Loss 1.0864   LearningRate 0.0007   Epoch: 18   Global Step: 227700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:17:57,280-Speed 3026.49 samples/sec   Loss 1.0903   LearningRate 0.0007   Epoch: 18   Global Step: 227710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:18:00,720-Speed 2977.56 samples/sec   Loss 1.1241   LearningRate 0.0007   Epoch: 18   Global Step: 227720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:18:04,114-Speed 3018.32 samples/sec   Loss 1.0325   LearningRate 0.0007   Epoch: 18   Global Step: 227730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:18:07,531-Speed 2996.70 samples/sec   Loss 1.1037   LearningRate 0.0007   Epoch: 18   Global Step: 227740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:18:10,890-Speed 3049.79 samples/sec   Loss 1.0998   LearningRate 0.0007   Epoch: 18   Global Step: 227750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:18:14,205-Speed 3089.82 samples/sec   Loss 1.0825   LearningRate 0.0007   Epoch: 18   Global Step: 227760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:18:17,606-Speed 3011.49 samples/sec   Loss 1.1276   LearningRate 0.0007   Epoch: 18   Global Step: 227770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:18:21,009-Speed 3010.68 samples/sec   Loss 1.0934   LearningRate 0.0007   Epoch: 18   Global Step: 227780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:18:24,454-Speed 2972.50 samples/sec   Loss 1.0958   LearningRate 0.0007   Epoch: 18   Global Step: 227790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:18:27,847-Speed 3019.01 samples/sec   Loss 1.0889   LearningRate 0.0007   Epoch: 18   Global Step: 227800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:18:31,224-Speed 3033.61 samples/sec   Loss 1.0742   LearningRate 0.0007   Epoch: 18   Global Step: 227810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:18:34,598-Speed 3035.70 samples/sec   Loss 1.0435   LearningRate 0.0007   Epoch: 18   Global Step: 227820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:18:37,985-Speed 3024.07 samples/sec   Loss 1.1108   LearningRate 0.0007   Epoch: 18   Global Step: 227830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:18:41,437-Speed 2966.82 samples/sec   Loss 1.1295   LearningRate 0.0007   Epoch: 18   Global Step: 227840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:18:44,809-Speed 3037.44 samples/sec   Loss 1.1160   LearningRate 0.0007   Epoch: 18   Global Step: 227850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:18:48,196-Speed 3024.10 samples/sec   Loss 1.1030   LearningRate 0.0007   Epoch: 18   Global Step: 227860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:18:51,603-Speed 3006.64 samples/sec   Loss 1.1043   LearningRate 0.0007   Epoch: 18   Global Step: 227870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:18:54,941-Speed 3068.65 samples/sec   Loss 1.0967   LearningRate 0.0007   Epoch: 18   Global Step: 227880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:18:58,318-Speed 3033.64 samples/sec   Loss 1.1592   LearningRate 0.0007   Epoch: 18   Global Step: 227890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:19:01,735-Speed 2997.72 samples/sec   Loss 1.1043   LearningRate 0.0007   Epoch: 18   Global Step: 227900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:19:05,153-Speed 2996.46 samples/sec   Loss 1.1204   LearningRate 0.0007   Epoch: 18   Global Step: 227910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:19:08,625-Speed 2950.18 samples/sec   Loss 1.0874   LearningRate 0.0007   Epoch: 18   Global Step: 227920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:19:12,026-Speed 3012.00 samples/sec   Loss 1.0801   LearningRate 0.0007   Epoch: 18   Global Step: 227930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:19:15,373-Speed 3060.16 samples/sec   Loss 1.0873   LearningRate 0.0007   Epoch: 18   Global Step: 227940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:19:18,766-Speed 3018.70 samples/sec   Loss 1.1272   LearningRate 0.0007   Epoch: 18   Global Step: 227950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:19:22,111-Speed 3061.56 samples/sec   Loss 1.1106   LearningRate 0.0007   Epoch: 18   Global Step: 227960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:19:25,445-Speed 3071.93 samples/sec   Loss 1.1063   LearningRate 0.0007   Epoch: 18   Global Step: 227970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:19:28,805-Speed 3049.49 samples/sec   Loss 1.0820   LearningRate 0.0007   Epoch: 18   Global Step: 227980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:19:32,183-Speed 3031.97 samples/sec   Loss 1.1356   LearningRate 0.0007   Epoch: 18   Global Step: 227990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:19:35,562-Speed 3030.93 samples/sec   Loss 1.0837   LearningRate 0.0007   Epoch: 18   Global Step: 228000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:19:38,905-Speed 3064.55 samples/sec   Loss 1.1311   LearningRate 0.0007   Epoch: 18   Global Step: 228010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:19:42,328-Speed 2991.79 samples/sec   Loss 1.1039   LearningRate 0.0007   Epoch: 18   Global Step: 228020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:19:45,740-Speed 3002.33 samples/sec   Loss 1.0975   LearningRate 0.0007   Epoch: 18   Global Step: 228030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:19:49,089-Speed 3058.58 samples/sec   Loss 1.0938   LearningRate 0.0007   Epoch: 18   Global Step: 228040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:19:52,412-Speed 3081.88 samples/sec   Loss 1.1067   LearningRate 0.0007   Epoch: 18   Global Step: 228050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:19:55,751-Speed 3067.84 samples/sec   Loss 1.1008   LearningRate 0.0007   Epoch: 18   Global Step: 228060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:19:59,089-Speed 3069.19 samples/sec   Loss 1.1273   LearningRate 0.0007   Epoch: 18   Global Step: 228070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:20:02,442-Speed 3054.15 samples/sec   Loss 1.0999   LearningRate 0.0007   Epoch: 18   Global Step: 228080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:20:05,849-Speed 3007.25 samples/sec   Loss 1.1094   LearningRate 0.0007   Epoch: 18   Global Step: 228090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:20:09,246-Speed 3014.90 samples/sec   Loss 1.1008   LearningRate 0.0007   Epoch: 18   Global Step: 228100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:20:12,626-Speed 3030.03 samples/sec   Loss 1.0867   LearningRate 0.0007   Epoch: 18   Global Step: 228110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:20:15,985-Speed 3050.00 samples/sec   Loss 1.1044   LearningRate 0.0007   Epoch: 18   Global Step: 228120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:20:19,351-Speed 3043.05 samples/sec   Loss 1.1463   LearningRate 0.0007   Epoch: 18   Global Step: 228130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:20:22,755-Speed 3009.20 samples/sec   Loss 1.0831   LearningRate 0.0007   Epoch: 18   Global Step: 228140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:20:26,173-Speed 2996.08 samples/sec   Loss 1.1236   LearningRate 0.0007   Epoch: 18   Global Step: 228150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:20:29,509-Speed 3070.88 samples/sec   Loss 1.0343   LearningRate 0.0007   Epoch: 18   Global Step: 228160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:20:32,904-Speed 3017.31 samples/sec   Loss 1.0673   LearningRate 0.0007   Epoch: 18   Global Step: 228170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:20:36,276-Speed 3036.75 samples/sec   Loss 1.0937   LearningRate 0.0007   Epoch: 18   Global Step: 228180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:20:39,674-Speed 3014.79 samples/sec   Loss 1.1095   LearningRate 0.0007   Epoch: 18   Global Step: 228190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:20:43,062-Speed 3023.72 samples/sec   Loss 1.1484   LearningRate 0.0007   Epoch: 18   Global Step: 228200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:20:46,436-Speed 3035.42 samples/sec   Loss 1.1480   LearningRate 0.0007   Epoch: 18   Global Step: 228210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:20:49,961-Speed 2905.88 samples/sec   Loss 1.0979   LearningRate 0.0007   Epoch: 18   Global Step: 228220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:20:53,415-Speed 2965.65 samples/sec   Loss 1.1137   LearningRate 0.0007   Epoch: 18   Global Step: 228230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:20:56,776-Speed 3047.14 samples/sec   Loss 1.0988   LearningRate 0.0007   Epoch: 18   Global Step: 228240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:21:00,096-Speed 3086.16 samples/sec   Loss 1.1296   LearningRate 0.0007   Epoch: 18   Global Step: 228250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:21:03,494-Speed 3014.10 samples/sec   Loss 1.0759   LearningRate 0.0007   Epoch: 18   Global Step: 228260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:21:06,846-Speed 3055.94 samples/sec   Loss 1.1429   LearningRate 0.0007   Epoch: 18   Global Step: 228270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:21:10,217-Speed 3038.95 samples/sec   Loss 1.1332   LearningRate 0.0007   Epoch: 18   Global Step: 228280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:21:13,579-Speed 3045.92 samples/sec   Loss 1.0732   LearningRate 0.0007   Epoch: 18   Global Step: 228290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:21:17,013-Speed 2983.40 samples/sec   Loss 1.0788   LearningRate 0.0007   Epoch: 18   Global Step: 228300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:21:20,464-Speed 2968.67 samples/sec   Loss 1.0862   LearningRate 0.0007   Epoch: 18   Global Step: 228310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:21:23,879-Speed 2999.17 samples/sec   Loss 1.0588   LearningRate 0.0007   Epoch: 18   Global Step: 228320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:21:27,239-Speed 3048.22 samples/sec   Loss 1.1324   LearningRate 0.0007   Epoch: 18   Global Step: 228330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:21:30,757-Speed 2911.96 samples/sec   Loss 1.1176   LearningRate 0.0007   Epoch: 18   Global Step: 228340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:21:34,088-Speed 3075.31 samples/sec   Loss 1.0886   LearningRate 0.0007   Epoch: 18   Global Step: 228350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:21:37,484-Speed 3015.82 samples/sec   Loss 1.0895   LearningRate 0.0007   Epoch: 18   Global Step: 228360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:21:40,899-Speed 2999.78 samples/sec   Loss 1.1194   LearningRate 0.0007   Epoch: 18   Global Step: 228370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:21:44,316-Speed 2997.24 samples/sec   Loss 1.0668   LearningRate 0.0007   Epoch: 18   Global Step: 228380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:21:47,788-Speed 2950.23 samples/sec   Loss 1.0860   LearningRate 0.0007   Epoch: 18   Global Step: 228390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:21:51,190-Speed 3023.08 samples/sec   Loss 1.1127   LearningRate 0.0006   Epoch: 18   Global Step: 228400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:21:54,611-Speed 2994.08 samples/sec   Loss 1.0815   LearningRate 0.0006   Epoch: 18   Global Step: 228410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:21:57,979-Speed 3040.93 samples/sec   Loss 1.0808   LearningRate 0.0006   Epoch: 18   Global Step: 228420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:22:01,433-Speed 2965.78 samples/sec   Loss 1.1133   LearningRate 0.0006   Epoch: 18   Global Step: 228430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:22:04,818-Speed 3025.85 samples/sec   Loss 1.0857   LearningRate 0.0006   Epoch: 18   Global Step: 228440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:22:08,206-Speed 3023.27 samples/sec   Loss 1.0969   LearningRate 0.0006   Epoch: 18   Global Step: 228450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:22:11,611-Speed 3008.27 samples/sec   Loss 1.0903   LearningRate 0.0006   Epoch: 18   Global Step: 228460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:22:14,968-Speed 3051.82 samples/sec   Loss 1.0803   LearningRate 0.0006   Epoch: 18   Global Step: 228470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:22:18,392-Speed 2991.22 samples/sec   Loss 1.1423   LearningRate 0.0006   Epoch: 18   Global Step: 228480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:22:21,716-Speed 3082.12 samples/sec   Loss 1.1475   LearningRate 0.0006   Epoch: 18   Global Step: 228490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:22:25,149-Speed 2983.91 samples/sec   Loss 1.0984   LearningRate 0.0006   Epoch: 18   Global Step: 228500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:22:28,533-Speed 3027.13 samples/sec   Loss 1.1171   LearningRate 0.0006   Epoch: 18   Global Step: 228510   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:22:31,982-Speed 2969.39 samples/sec   Loss 1.0928   LearningRate 0.0006   Epoch: 18   Global Step: 228520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:22:35,396-Speed 3000.35 samples/sec   Loss 1.0629   LearningRate 0.0006   Epoch: 18   Global Step: 228530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:22:38,845-Speed 2969.96 samples/sec   Loss 1.1182   LearningRate 0.0006   Epoch: 18   Global Step: 228540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:22:42,207-Speed 3046.53 samples/sec   Loss 1.1314   LearningRate 0.0006   Epoch: 18   Global Step: 228550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:22:45,644-Speed 2979.69 samples/sec   Loss 1.1175   LearningRate 0.0006   Epoch: 18   Global Step: 228560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:22:48,954-Speed 3094.86 samples/sec   Loss 1.1013   LearningRate 0.0006   Epoch: 18   Global Step: 228570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:22:52,357-Speed 3010.22 samples/sec   Loss 1.1028   LearningRate 0.0006   Epoch: 18   Global Step: 228580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:22:55,774-Speed 2997.39 samples/sec   Loss 1.1454   LearningRate 0.0006   Epoch: 18   Global Step: 228590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:22:59,101-Speed 3079.11 samples/sec   Loss 1.1086   LearningRate 0.0006   Epoch: 18   Global Step: 228600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:02,498-Speed 3015.08 samples/sec   Loss 1.0751   LearningRate 0.0006   Epoch: 18   Global Step: 228610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:05,907-Speed 3005.45 samples/sec   Loss 1.1307   LearningRate 0.0006   Epoch: 18   Global Step: 228620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:09,275-Speed 3040.71 samples/sec   Loss 1.1271   LearningRate 0.0006   Epoch: 18   Global Step: 228630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:12,720-Speed 2973.81 samples/sec   Loss 1.1370   LearningRate 0.0006   Epoch: 18   Global Step: 228640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:16,169-Speed 2969.27 samples/sec   Loss 1.1220   LearningRate 0.0006   Epoch: 18   Global Step: 228650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:19,604-Speed 2983.41 samples/sec   Loss 1.1360   LearningRate 0.0006   Epoch: 18   Global Step: 228660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:22,948-Speed 3062.89 samples/sec   Loss 1.1588   LearningRate 0.0006   Epoch: 18   Global Step: 228670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:26,353-Speed 3008.05 samples/sec   Loss 1.0996   LearningRate 0.0006   Epoch: 18   Global Step: 228680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:29,753-Speed 3012.92 samples/sec   Loss 1.1262   LearningRate 0.0006   Epoch: 18   Global Step: 228690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:33,116-Speed 3045.58 samples/sec   Loss 1.1787   LearningRate 0.0006   Epoch: 18   Global Step: 228700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:36,425-Speed 3095.38 samples/sec   Loss 1.0673   LearningRate 0.0006   Epoch: 18   Global Step: 228710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:39,770-Speed 3062.18 samples/sec   Loss 1.1175   LearningRate 0.0006   Epoch: 18   Global Step: 228720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:43,153-Speed 3027.04 samples/sec   Loss 1.1057   LearningRate 0.0006   Epoch: 18   Global Step: 228730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:46,557-Speed 3010.36 samples/sec   Loss 1.1081   LearningRate 0.0006   Epoch: 18   Global Step: 228740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:49,917-Speed 3047.82 samples/sec   Loss 1.1092   LearningRate 0.0006   Epoch: 18   Global Step: 228750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:53,367-Speed 2968.89 samples/sec   Loss 1.1305   LearningRate 0.0006   Epoch: 18   Global Step: 228760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:23:56,714-Speed 3060.76 samples/sec   Loss 1.0761   LearningRate 0.0006   Epoch: 18   Global Step: 228770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:24:00,133-Speed 2995.42 samples/sec   Loss 1.1108   LearningRate 0.0006   Epoch: 18   Global Step: 228780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:24:03,432-Speed 3105.40 samples/sec   Loss 1.0770   LearningRate 0.0006   Epoch: 18   Global Step: 228790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:24:06,801-Speed 3039.74 samples/sec   Loss 1.0986   LearningRate 0.0006   Epoch: 18   Global Step: 228800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:24:10,224-Speed 2992.91 samples/sec   Loss 1.1296   LearningRate 0.0006   Epoch: 18   Global Step: 228810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:24:13,686-Speed 2958.60 samples/sec   Loss 1.1741   LearningRate 0.0006   Epoch: 18   Global Step: 228820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:24:17,041-Speed 3053.07 samples/sec   Loss 1.1246   LearningRate 0.0006   Epoch: 18   Global Step: 228830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:24:20,397-Speed 3052.56 samples/sec   Loss 1.1690   LearningRate 0.0006   Epoch: 18   Global Step: 228840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:24:23,806-Speed 3004.65 samples/sec   Loss 1.0845   LearningRate 0.0006   Epoch: 18   Global Step: 228850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:24:27,215-Speed 3004.21 samples/sec   Loss 1.1382   LearningRate 0.0006   Epoch: 18   Global Step: 228860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:24:30,717-Speed 2925.20 samples/sec   Loss 1.1052   LearningRate 0.0006   Epoch: 18   Global Step: 228870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:24:34,108-Speed 3020.46 samples/sec   Loss 1.1060   LearningRate 0.0006   Epoch: 18   Global Step: 228880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:24:37,598-Speed 2934.39 samples/sec   Loss 1.1254   LearningRate 0.0006   Epoch: 18   Global Step: 228890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:24:41,063-Speed 2956.30 samples/sec   Loss 1.0807   LearningRate 0.0006   Epoch: 18   Global Step: 228900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:24:44,428-Speed 3043.56 samples/sec   Loss 1.1460   LearningRate 0.0006   Epoch: 18   Global Step: 228910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:24:47,815-Speed 3024.10 samples/sec   Loss 1.0614   LearningRate 0.0006   Epoch: 18   Global Step: 228920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:24:51,260-Speed 2973.91 samples/sec   Loss 1.1098   LearningRate 0.0006   Epoch: 18   Global Step: 228930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:24:54,603-Speed 3063.91 samples/sec   Loss 1.1211   LearningRate 0.0006   Epoch: 18   Global Step: 228940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:24:58,054-Speed 2968.03 samples/sec   Loss 1.1395   LearningRate 0.0006   Epoch: 18   Global Step: 228950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:25:01,402-Speed 3059.41 samples/sec   Loss 1.1516   LearningRate 0.0006   Epoch: 18   Global Step: 228960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:25:04,859-Speed 2963.38 samples/sec   Loss 1.0979   LearningRate 0.0006   Epoch: 18   Global Step: 228970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:25:08,313-Speed 2964.82 samples/sec   Loss 1.0881   LearningRate 0.0006   Epoch: 18   Global Step: 228980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:25:11,725-Speed 3001.49 samples/sec   Loss 1.0973   LearningRate 0.0006   Epoch: 18   Global Step: 228990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:25:15,146-Speed 2994.38 samples/sec   Loss 1.1250   LearningRate 0.0006   Epoch: 18   Global Step: 229000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:25:18,613-Speed 2954.49 samples/sec   Loss 1.0844   LearningRate 0.0006   Epoch: 18   Global Step: 229010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:25:21,977-Speed 3045.63 samples/sec   Loss 1.1378   LearningRate 0.0006   Epoch: 18   Global Step: 229020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:25:25,469-Speed 2932.88 samples/sec   Loss 1.0832   LearningRate 0.0006   Epoch: 18   Global Step: 229030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:25:28,972-Speed 2924.40 samples/sec   Loss 1.1526   LearningRate 0.0006   Epoch: 18   Global Step: 229040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:25:32,419-Speed 2971.99 samples/sec   Loss 1.1440   LearningRate 0.0006   Epoch: 18   Global Step: 229050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:25:35,820-Speed 3011.18 samples/sec   Loss 1.1031   LearningRate 0.0006   Epoch: 18   Global Step: 229060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:25:39,163-Speed 3064.35 samples/sec   Loss 1.1167   LearningRate 0.0006   Epoch: 18   Global Step: 229070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:25:42,498-Speed 3070.83 samples/sec   Loss 1.1222   LearningRate 0.0006   Epoch: 18   Global Step: 229080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:25:45,901-Speed 3011.15 samples/sec   Loss 1.1674   LearningRate 0.0006   Epoch: 18   Global Step: 229090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:25:49,345-Speed 2974.08 samples/sec   Loss 1.1529   LearningRate 0.0006   Epoch: 18   Global Step: 229100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:25:52,768-Speed 2992.21 samples/sec   Loss 1.1313   LearningRate 0.0006   Epoch: 18   Global Step: 229110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:25:56,184-Speed 2998.35 samples/sec   Loss 1.0565   LearningRate 0.0006   Epoch: 18   Global Step: 229120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:25:59,516-Speed 3074.90 samples/sec   Loss 1.1151   LearningRate 0.0006   Epoch: 18   Global Step: 229130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:02,867-Speed 3056.09 samples/sec   Loss 1.0975   LearningRate 0.0006   Epoch: 18   Global Step: 229140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:06,231-Speed 3044.78 samples/sec   Loss 1.1128   LearningRate 0.0006   Epoch: 18   Global Step: 229150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:09,683-Speed 2967.32 samples/sec   Loss 1.1317   LearningRate 0.0006   Epoch: 18   Global Step: 229160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:13,077-Speed 3018.15 samples/sec   Loss 1.1034   LearningRate 0.0006   Epoch: 18   Global Step: 229170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:16,507-Speed 2985.98 samples/sec   Loss 1.0619   LearningRate 0.0006   Epoch: 18   Global Step: 229180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:26:19,904-Speed 3015.81 samples/sec   Loss 1.1038   LearningRate 0.0006   Epoch: 18   Global Step: 229190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:26:23,272-Speed 3040.42 samples/sec   Loss 1.0875   LearningRate 0.0006   Epoch: 18   Global Step: 229200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:26,669-Speed 3015.41 samples/sec   Loss 1.1136   LearningRate 0.0006   Epoch: 18   Global Step: 229210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:30,028-Speed 3049.29 samples/sec   Loss 1.1179   LearningRate 0.0006   Epoch: 18   Global Step: 229220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:33,514-Speed 2938.70 samples/sec   Loss 1.1115   LearningRate 0.0006   Epoch: 18   Global Step: 229230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:36,862-Speed 3059.36 samples/sec   Loss 1.1051   LearningRate 0.0006   Epoch: 18   Global Step: 229240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:40,229-Speed 3042.22 samples/sec   Loss 1.1280   LearningRate 0.0006   Epoch: 18   Global Step: 229250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:43,533-Speed 3099.69 samples/sec   Loss 1.0992   LearningRate 0.0006   Epoch: 18   Global Step: 229260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:46,947-Speed 3000.51 samples/sec   Loss 1.0965   LearningRate 0.0006   Epoch: 18   Global Step: 229270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:50,438-Speed 2934.25 samples/sec   Loss 1.1280   LearningRate 0.0006   Epoch: 18   Global Step: 229280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:53,824-Speed 3025.05 samples/sec   Loss 1.0717   LearningRate 0.0006   Epoch: 18   Global Step: 229290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:26:57,258-Speed 2982.04 samples/sec   Loss 1.1464   LearningRate 0.0006   Epoch: 18   Global Step: 229300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:27:00,650-Speed 3020.35 samples/sec   Loss 1.1193   LearningRate 0.0006   Epoch: 18   Global Step: 229310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:27:04,021-Speed 3038.90 samples/sec   Loss 1.1004   LearningRate 0.0006   Epoch: 18   Global Step: 229320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:27:07,361-Speed 3066.55 samples/sec   Loss 1.1028   LearningRate 0.0006   Epoch: 18   Global Step: 229330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:27:10,677-Speed 3089.01 samples/sec   Loss 1.0830   LearningRate 0.0006   Epoch: 18   Global Step: 229340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:27:14,019-Speed 3064.53 samples/sec   Loss 1.1057   LearningRate 0.0006   Epoch: 18   Global Step: 229350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:27:17,484-Speed 2956.43 samples/sec   Loss 1.1154   LearningRate 0.0006   Epoch: 18   Global Step: 229360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:27:20,914-Speed 2985.94 samples/sec   Loss 1.1194   LearningRate 0.0006   Epoch: 18   Global Step: 229370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:27:24,299-Speed 3026.25 samples/sec   Loss 1.1597   LearningRate 0.0006   Epoch: 18   Global Step: 229380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:27:27,744-Speed 2972.91 samples/sec   Loss 1.1183   LearningRate 0.0006   Epoch: 18   Global Step: 229390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:27:31,184-Speed 2978.61 samples/sec   Loss 1.1769   LearningRate 0.0006   Epoch: 18   Global Step: 229400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:27:34,565-Speed 3028.59 samples/sec   Loss 1.1143   LearningRate 0.0006   Epoch: 18   Global Step: 229410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:27:38,024-Speed 2961.74 samples/sec   Loss 1.0920   LearningRate 0.0006   Epoch: 18   Global Step: 229420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:27:41,501-Speed 2945.95 samples/sec   Loss 1.1437   LearningRate 0.0006   Epoch: 18   Global Step: 229430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:27:44,827-Speed 3078.93 samples/sec   Loss 1.1313   LearningRate 0.0006   Epoch: 18   Global Step: 229440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:27:48,153-Speed 3079.55 samples/sec   Loss 1.0921   LearningRate 0.0006   Epoch: 18   Global Step: 229450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:27:51,546-Speed 3020.22 samples/sec   Loss 1.0737   LearningRate 0.0006   Epoch: 18   Global Step: 229460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:27:54,869-Speed 3083.12 samples/sec   Loss 1.1462   LearningRate 0.0006   Epoch: 18   Global Step: 229470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:27:58,220-Speed 3056.09 samples/sec   Loss 1.1396   LearningRate 0.0006   Epoch: 18   Global Step: 229480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:28:01,649-Speed 2987.16 samples/sec   Loss 1.1120   LearningRate 0.0006   Epoch: 18   Global Step: 229490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:28:05,043-Speed 3017.98 samples/sec   Loss 1.0800   LearningRate 0.0006   Epoch: 18   Global Step: 229500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:28:08,481-Speed 2979.18 samples/sec   Loss 1.1299   LearningRate 0.0006   Epoch: 18   Global Step: 229510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:28:11,848-Speed 3042.07 samples/sec   Loss 1.1173   LearningRate 0.0006   Epoch: 18   Global Step: 229520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:28:15,328-Speed 2943.49 samples/sec   Loss 1.1304   LearningRate 0.0006   Epoch: 18   Global Step: 229530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:28:18,765-Speed 2980.25 samples/sec   Loss 1.1313   LearningRate 0.0006   Epoch: 18   Global Step: 229540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:28:22,128-Speed 3045.95 samples/sec   Loss 1.1055   LearningRate 0.0006   Epoch: 18   Global Step: 229550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:28:25,502-Speed 3035.90 samples/sec   Loss 1.0789   LearningRate 0.0006   Epoch: 18   Global Step: 229560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:28:28,918-Speed 2998.75 samples/sec   Loss 1.1318   LearningRate 0.0006   Epoch: 18   Global Step: 229570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:28:32,364-Speed 2972.35 samples/sec   Loss 1.1113   LearningRate 0.0006   Epoch: 18   Global Step: 229580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:28:35,781-Speed 2997.25 samples/sec   Loss 1.0970   LearningRate 0.0006   Epoch: 18   Global Step: 229590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:28:39,215-Speed 2983.39 samples/sec   Loss 1.1043   LearningRate 0.0006   Epoch: 18   Global Step: 229600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:28:42,619-Speed 3008.28 samples/sec   Loss 1.0758   LearningRate 0.0006   Epoch: 18   Global Step: 229610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:28:45,971-Speed 3056.23 samples/sec   Loss 1.0842   LearningRate 0.0006   Epoch: 18   Global Step: 229620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:28:49,393-Speed 2993.31 samples/sec   Loss 1.0921   LearningRate 0.0006   Epoch: 18   Global Step: 229630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:28:52,799-Speed 3006.96 samples/sec   Loss 1.1041   LearningRate 0.0006   Epoch: 18   Global Step: 229640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:28:56,158-Speed 3049.10 samples/sec   Loss 1.1252   LearningRate 0.0006   Epoch: 18   Global Step: 229650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:28:59,540-Speed 3028.70 samples/sec   Loss 1.1033   LearningRate 0.0006   Epoch: 18   Global Step: 229660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:29:02,964-Speed 2991.47 samples/sec   Loss 1.1086   LearningRate 0.0006   Epoch: 18   Global Step: 229670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:29:06,325-Speed 3047.29 samples/sec   Loss 1.0677   LearningRate 0.0006   Epoch: 18   Global Step: 229680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:29:09,792-Speed 2955.13 samples/sec   Loss 1.1035   LearningRate 0.0006   Epoch: 18   Global Step: 229690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:29:13,223-Speed 2985.00 samples/sec   Loss 1.0645   LearningRate 0.0006   Epoch: 18   Global Step: 229700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:29:16,595-Speed 3037.90 samples/sec   Loss 1.0982   LearningRate 0.0006   Epoch: 18   Global Step: 229710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:29:19,986-Speed 3020.61 samples/sec   Loss 1.1068   LearningRate 0.0006   Epoch: 18   Global Step: 229720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:29:23,505-Speed 2910.68 samples/sec   Loss 1.1565   LearningRate 0.0006   Epoch: 18   Global Step: 229730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:29:26,915-Speed 3003.99 samples/sec   Loss 1.1110   LearningRate 0.0006   Epoch: 18   Global Step: 229740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:29:30,301-Speed 3024.96 samples/sec   Loss 1.1240   LearningRate 0.0006   Epoch: 18   Global Step: 229750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:29:33,676-Speed 3034.89 samples/sec   Loss 1.1014   LearningRate 0.0006   Epoch: 18   Global Step: 229760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:29:37,127-Speed 2968.59 samples/sec   Loss 1.0969   LearningRate 0.0006   Epoch: 18   Global Step: 229770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:29:40,490-Speed 3045.84 samples/sec   Loss 1.1308   LearningRate 0.0006   Epoch: 18   Global Step: 229780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:29:43,885-Speed 3017.09 samples/sec   Loss 1.1413   LearningRate 0.0006   Epoch: 18   Global Step: 229790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:29:47,280-Speed 3017.25 samples/sec   Loss 1.0503   LearningRate 0.0006   Epoch: 18   Global Step: 229800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:29:50,663-Speed 3027.34 samples/sec   Loss 1.1360   LearningRate 0.0006   Epoch: 18   Global Step: 229810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:29:54,060-Speed 3015.28 samples/sec   Loss 1.1119   LearningRate 0.0006   Epoch: 18   Global Step: 229820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:29:57,482-Speed 2993.33 samples/sec   Loss 1.1111   LearningRate 0.0006   Epoch: 18   Global Step: 229830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:30:00,976-Speed 2931.98 samples/sec   Loss 1.1424   LearningRate 0.0006   Epoch: 18   Global Step: 229840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:30:04,506-Speed 2901.37 samples/sec   Loss 1.1403   LearningRate 0.0006   Epoch: 18   Global Step: 229850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:30:07,935-Speed 2987.38 samples/sec   Loss 1.1203   LearningRate 0.0006   Epoch: 18   Global Step: 229860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:30:11,314-Speed 3031.35 samples/sec   Loss 1.1300   LearningRate 0.0006   Epoch: 18   Global Step: 229870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:30:14,771-Speed 2962.85 samples/sec   Loss 1.1542   LearningRate 0.0006   Epoch: 18   Global Step: 229880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:30:18,217-Speed 2972.60 samples/sec   Loss 1.1513   LearningRate 0.0006   Epoch: 18   Global Step: 229890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:30:21,665-Speed 2970.20 samples/sec   Loss 1.1260   LearningRate 0.0006   Epoch: 18   Global Step: 229900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:30:25,116-Speed 2967.67 samples/sec   Loss 1.1325   LearningRate 0.0006   Epoch: 18   Global Step: 229910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:30:28,665-Speed 2886.02 samples/sec   Loss 1.1055   LearningRate 0.0006   Epoch: 18   Global Step: 229920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:30:32,051-Speed 3025.04 samples/sec   Loss 1.0541   LearningRate 0.0006   Epoch: 18   Global Step: 229930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:30:35,435-Speed 3027.39 samples/sec   Loss 1.0863   LearningRate 0.0006   Epoch: 18   Global Step: 229940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:30:38,876-Speed 2976.15 samples/sec   Loss 1.1085   LearningRate 0.0006   Epoch: 18   Global Step: 229950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:30:42,199-Speed 3083.08 samples/sec   Loss 1.1020   LearningRate 0.0006   Epoch: 18   Global Step: 229960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:30:45,639-Speed 2977.05 samples/sec   Loss 1.0861   LearningRate 0.0006   Epoch: 18   Global Step: 229970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:30:49,007-Speed 3041.87 samples/sec   Loss 1.1173   LearningRate 0.0006   Epoch: 18   Global Step: 229980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:30:52,406-Speed 3013.50 samples/sec   Loss 1.1140   LearningRate 0.0006   Epoch: 18   Global Step: 229990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:30:55,827-Speed 2993.98 samples/sec   Loss 1.0825   LearningRate 0.0005   Epoch: 18   Global Step: 230000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:30:59,203-Speed 3034.02 samples/sec   Loss 1.1172   LearningRate 0.0005   Epoch: 18   Global Step: 230010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:31:02,552-Speed 3058.79 samples/sec   Loss 1.1422   LearningRate 0.0005   Epoch: 18   Global Step: 230020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:31:05,989-Speed 2980.24 samples/sec   Loss 1.1389   LearningRate 0.0005   Epoch: 18   Global Step: 230030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:31:09,330-Speed 3066.16 samples/sec   Loss 1.0808   LearningRate 0.0005   Epoch: 18   Global Step: 230040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:31:12,768-Speed 2979.29 samples/sec   Loss 1.1279   LearningRate 0.0005   Epoch: 18   Global Step: 230050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:31:16,173-Speed 3008.73 samples/sec   Loss 1.1407   LearningRate 0.0005   Epoch: 18   Global Step: 230060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:31:19,567-Speed 3017.54 samples/sec   Loss 1.0831   LearningRate 0.0005   Epoch: 18   Global Step: 230070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:31:22,914-Speed 3059.72 samples/sec   Loss 1.1420   LearningRate 0.0005   Epoch: 18   Global Step: 230080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:31:26,259-Speed 3062.78 samples/sec   Loss 1.0920   LearningRate 0.0005   Epoch: 18   Global Step: 230090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:31:29,602-Speed 3063.67 samples/sec   Loss 1.1167   LearningRate 0.0005   Epoch: 18   Global Step: 230100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:31:32,966-Speed 3044.53 samples/sec   Loss 1.1760   LearningRate 0.0005   Epoch: 18   Global Step: 230110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:31:36,334-Speed 3041.42 samples/sec   Loss 1.0900   LearningRate 0.0005   Epoch: 18   Global Step: 230120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:31:39,750-Speed 2998.36 samples/sec   Loss 1.1464   LearningRate 0.0005   Epoch: 18   Global Step: 230130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:31:43,148-Speed 3014.29 samples/sec   Loss 1.1197   LearningRate 0.0005   Epoch: 18   Global Step: 230140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:31:46,511-Speed 3046.29 samples/sec   Loss 1.1101   LearningRate 0.0005   Epoch: 18   Global Step: 230150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:31:49,853-Speed 3064.61 samples/sec   Loss 1.1068   LearningRate 0.0005   Epoch: 18   Global Step: 230160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:31:53,190-Speed 3069.22 samples/sec   Loss 1.1163   LearningRate 0.0005   Epoch: 18   Global Step: 230170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:31:56,571-Speed 3029.53 samples/sec   Loss 1.1236   LearningRate 0.0005   Epoch: 18   Global Step: 230180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:31:59,964-Speed 3018.96 samples/sec   Loss 1.0571   LearningRate 0.0005   Epoch: 18   Global Step: 230190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:32:03,337-Speed 3037.32 samples/sec   Loss 1.1246   LearningRate 0.0005   Epoch: 18   Global Step: 230200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:32:06,736-Speed 3013.21 samples/sec   Loss 1.0642   LearningRate 0.0005   Epoch: 18   Global Step: 230210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:32:10,141-Speed 3007.99 samples/sec   Loss 1.1229   LearningRate 0.0005   Epoch: 18   Global Step: 230220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:32:13,585-Speed 2974.45 samples/sec   Loss 1.1186   LearningRate 0.0005   Epoch: 18   Global Step: 230230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:32:16,938-Speed 3056.46 samples/sec   Loss 1.0917   LearningRate 0.0005   Epoch: 18   Global Step: 230240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:32:20,251-Speed 3091.17 samples/sec   Loss 1.1227   LearningRate 0.0005   Epoch: 18   Global Step: 230250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:32:23,594-Speed 3064.13 samples/sec   Loss 1.1102   LearningRate 0.0005   Epoch: 18   Global Step: 230260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:32:26,964-Speed 3039.32 samples/sec   Loss 1.1056   LearningRate 0.0005   Epoch: 18   Global Step: 230270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:32:30,278-Speed 3091.43 samples/sec   Loss 1.1305   LearningRate 0.0005   Epoch: 18   Global Step: 230280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:32:33,667-Speed 3022.28 samples/sec   Loss 1.0851   LearningRate 0.0005   Epoch: 18   Global Step: 230290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:32:37,133-Speed 2954.80 samples/sec   Loss 1.1091   LearningRate 0.0005   Epoch: 18   Global Step: 230300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:32:40,594-Speed 2959.76 samples/sec   Loss 1.0984   LearningRate 0.0005   Epoch: 18   Global Step: 230310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:32:44,008-Speed 3000.31 samples/sec   Loss 1.1373   LearningRate 0.0005   Epoch: 18   Global Step: 230320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:32:47,393-Speed 3026.12 samples/sec   Loss 1.0765   LearningRate 0.0005   Epoch: 18   Global Step: 230330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:32:50,800-Speed 3006.50 samples/sec   Loss 1.1057   LearningRate 0.0005   Epoch: 18   Global Step: 230340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:32:54,172-Speed 3037.65 samples/sec   Loss 1.1056   LearningRate 0.0005   Epoch: 18   Global Step: 230350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:32:57,601-Speed 2986.51 samples/sec   Loss 1.0593   LearningRate 0.0005   Epoch: 18   Global Step: 230360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:33:01,021-Speed 2995.28 samples/sec   Loss 1.0917   LearningRate 0.0005   Epoch: 18   Global Step: 230370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:33:04,366-Speed 3061.88 samples/sec   Loss 1.1467   LearningRate 0.0005   Epoch: 18   Global Step: 230380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:33:07,697-Speed 3075.51 samples/sec   Loss 1.1185   LearningRate 0.0005   Epoch: 18   Global Step: 230390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:33:11,083-Speed 3024.72 samples/sec   Loss 1.1154   LearningRate 0.0005   Epoch: 18   Global Step: 230400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:33:14,466-Speed 3027.53 samples/sec   Loss 1.1234   LearningRate 0.0005   Epoch: 18   Global Step: 230410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:33:17,850-Speed 3026.77 samples/sec   Loss 1.1085   LearningRate 0.0005   Epoch: 18   Global Step: 230420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:33:21,218-Speed 3040.89 samples/sec   Loss 1.0867   LearningRate 0.0005   Epoch: 18   Global Step: 230430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:33:24,573-Speed 3053.23 samples/sec   Loss 1.1345   LearningRate 0.0005   Epoch: 18   Global Step: 230440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:33:27,959-Speed 3025.05 samples/sec   Loss 1.0906   LearningRate 0.0005   Epoch: 18   Global Step: 230450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:33:31,401-Speed 2976.22 samples/sec   Loss 1.1457   LearningRate 0.0005   Epoch: 18   Global Step: 230460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:33:34,792-Speed 3021.01 samples/sec   Loss 1.1285   LearningRate 0.0005   Epoch: 18   Global Step: 230470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:33:38,186-Speed 3018.27 samples/sec   Loss 1.0974   LearningRate 0.0005   Epoch: 18   Global Step: 230480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:33:41,541-Speed 3052.79 samples/sec   Loss 1.1252   LearningRate 0.0005   Epoch: 18   Global Step: 230490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:33:44,858-Speed 3087.78 samples/sec   Loss 1.1082   LearningRate 0.0005   Epoch: 18   Global Step: 230500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:33:48,235-Speed 3033.27 samples/sec   Loss 1.1722   LearningRate 0.0005   Epoch: 18   Global Step: 230510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:33:51,580-Speed 3062.41 samples/sec   Loss 1.0860   LearningRate 0.0005   Epoch: 18   Global Step: 230520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:33:54,952-Speed 3037.95 samples/sec   Loss 1.1247   LearningRate 0.0005   Epoch: 18   Global Step: 230530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:33:58,340-Speed 3023.03 samples/sec   Loss 1.1180   LearningRate 0.0005   Epoch: 18   Global Step: 230540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:34:01,704-Speed 3045.22 samples/sec   Loss 1.0824   LearningRate 0.0005   Epoch: 18   Global Step: 230550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:34:05,150-Speed 2972.06 samples/sec   Loss 1.0908   LearningRate 0.0005   Epoch: 18   Global Step: 230560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:34:08,535-Speed 3026.77 samples/sec   Loss 1.0588   LearningRate 0.0005   Epoch: 18   Global Step: 230570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:34:11,931-Speed 3016.84 samples/sec   Loss 1.0903   LearningRate 0.0005   Epoch: 18   Global Step: 230580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:34:15,325-Speed 3017.16 samples/sec   Loss 1.1481   LearningRate 0.0005   Epoch: 18   Global Step: 230590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:34:18,726-Speed 3012.69 samples/sec   Loss 1.1323   LearningRate 0.0005   Epoch: 18   Global Step: 230600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:34:22,118-Speed 3019.87 samples/sec   Loss 1.1136   LearningRate 0.0005   Epoch: 18   Global Step: 230610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:34:25,504-Speed 3025.33 samples/sec   Loss 1.1516   LearningRate 0.0005   Epoch: 18   Global Step: 230620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:34:28,912-Speed 3005.32 samples/sec   Loss 1.0956   LearningRate 0.0005   Epoch: 18   Global Step: 230630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:34:32,270-Speed 3050.11 samples/sec   Loss 1.1072   LearningRate 0.0005   Epoch: 18   Global Step: 230640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:34:35,721-Speed 2967.99 samples/sec   Loss 1.0851   LearningRate 0.0005   Epoch: 18   Global Step: 230650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:34:39,105-Speed 3027.24 samples/sec   Loss 1.0878   LearningRate 0.0005   Epoch: 18   Global Step: 230660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:34:42,542-Speed 2980.32 samples/sec   Loss 1.0719   LearningRate 0.0005   Epoch: 18   Global Step: 230670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:34:46,038-Speed 2929.55 samples/sec   Loss 1.1301   LearningRate 0.0005   Epoch: 18   Global Step: 230680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:34:49,439-Speed 3012.25 samples/sec   Loss 1.0775   LearningRate 0.0005   Epoch: 18   Global Step: 230690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:34:52,777-Speed 3068.41 samples/sec   Loss 1.1239   LearningRate 0.0005   Epoch: 18   Global Step: 230700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:34:56,119-Speed 3065.10 samples/sec   Loss 1.1107   LearningRate 0.0005   Epoch: 18   Global Step: 230710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:34:59,476-Speed 3051.37 samples/sec   Loss 1.1462   LearningRate 0.0005   Epoch: 18   Global Step: 230720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:35:02,882-Speed 3007.47 samples/sec   Loss 1.0864   LearningRate 0.0005   Epoch: 18   Global Step: 230730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:35:06,239-Speed 3051.60 samples/sec   Loss 1.1286   LearningRate 0.0005   Epoch: 18   Global Step: 230740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:35:09,606-Speed 3042.03 samples/sec   Loss 1.1705   LearningRate 0.0005   Epoch: 18   Global Step: 230750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:35:13,017-Speed 3002.57 samples/sec   Loss 1.1139   LearningRate 0.0005   Epoch: 18   Global Step: 230760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:35:16,504-Speed 2938.08 samples/sec   Loss 1.1174   LearningRate 0.0005   Epoch: 18   Global Step: 230770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:35:19,910-Speed 3007.13 samples/sec   Loss 1.1233   LearningRate 0.0005   Epoch: 18   Global Step: 230780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:35:23,271-Speed 3047.54 samples/sec   Loss 1.1071   LearningRate 0.0005   Epoch: 18   Global Step: 230790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:35:26,714-Speed 2974.21 samples/sec   Loss 1.0668   LearningRate 0.0005   Epoch: 18   Global Step: 230800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:35:30,133-Speed 2996.28 samples/sec   Loss 1.1170   LearningRate 0.0005   Epoch: 18   Global Step: 230810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:35:33,466-Speed 3073.40 samples/sec   Loss 1.0821   LearningRate 0.0005   Epoch: 18   Global Step: 230820   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:35:36,843-Speed 3032.49 samples/sec   Loss 1.0955   LearningRate 0.0005   Epoch: 18   Global Step: 230830   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:35:40,248-Speed 3009.01 samples/sec   Loss 1.1170   LearningRate 0.0005   Epoch: 18   Global Step: 230840   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:35:43,673-Speed 2989.94 samples/sec   Loss 1.1234   LearningRate 0.0005   Epoch: 18   Global Step: 230850   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:35:47,025-Speed 3055.53 samples/sec   Loss 1.1563   LearningRate 0.0005   Epoch: 18   Global Step: 230860   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:35:50,425-Speed 3013.35 samples/sec   Loss 1.1690   LearningRate 0.0005   Epoch: 18   Global Step: 230870   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:35:53,775-Speed 3057.39 samples/sec   Loss 1.1111   LearningRate 0.0005   Epoch: 18   Global Step: 230880   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:35:57,098-Speed 3081.84 samples/sec   Loss 1.1113   LearningRate 0.0005   Epoch: 18   Global Step: 230890   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:36:00,474-Speed 3034.47 samples/sec   Loss 1.1048   LearningRate 0.0005   Epoch: 18   Global Step: 230900   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:36:03,872-Speed 3014.12 samples/sec   Loss 1.0752   LearningRate 0.0005   Epoch: 18   Global Step: 230910   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:36:07,261-Speed 3022.55 samples/sec   Loss 1.1049   LearningRate 0.0005   Epoch: 18   Global Step: 230920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:36:10,593-Speed 3074.22 samples/sec   Loss 1.1124   LearningRate 0.0005   Epoch: 18   Global Step: 230930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:36:13,950-Speed 3051.13 samples/sec   Loss 1.1302   LearningRate 0.0005   Epoch: 18   Global Step: 230940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:36:17,312-Speed 3046.71 samples/sec   Loss 1.1006   LearningRate 0.0005   Epoch: 18   Global Step: 230950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:36:20,770-Speed 2961.77 samples/sec   Loss 1.1266   LearningRate 0.0005   Epoch: 18   Global Step: 230960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:36:24,174-Speed 3008.96 samples/sec   Loss 1.0601   LearningRate 0.0005   Epoch: 18   Global Step: 230970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:36:27,582-Speed 3005.45 samples/sec   Loss 1.1232   LearningRate 0.0005   Epoch: 18   Global Step: 230980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:36:30,984-Speed 3011.09 samples/sec   Loss 1.1441   LearningRate 0.0005   Epoch: 18   Global Step: 230990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:36:34,343-Speed 3050.02 samples/sec   Loss 1.0941   LearningRate 0.0005   Epoch: 18   Global Step: 231000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:36:37,729-Speed 3024.53 samples/sec   Loss 1.1231   LearningRate 0.0005   Epoch: 18   Global Step: 231010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:36:41,096-Speed 3042.49 samples/sec   Loss 1.1676   LearningRate 0.0005   Epoch: 18   Global Step: 231020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:36:44,446-Speed 3057.49 samples/sec   Loss 1.0693   LearningRate 0.0005   Epoch: 18   Global Step: 231030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:36:47,884-Speed 2978.98 samples/sec   Loss 1.0988   LearningRate 0.0005   Epoch: 18   Global Step: 231040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:36:51,270-Speed 3025.23 samples/sec   Loss 1.1471   LearningRate 0.0005   Epoch: 18   Global Step: 231050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:36:54,635-Speed 3043.84 samples/sec   Loss 1.1439   LearningRate 0.0005   Epoch: 18   Global Step: 231060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:36:58,031-Speed 3016.30 samples/sec   Loss 1.0939   LearningRate 0.0005   Epoch: 18   Global Step: 231070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:37:01,395-Speed 3044.04 samples/sec   Loss 1.0750   LearningRate 0.0005   Epoch: 18   Global Step: 231080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:37:04,738-Speed 3064.38 samples/sec   Loss 1.1253   LearningRate 0.0005   Epoch: 18   Global Step: 231090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:37:08,123-Speed 3026.27 samples/sec   Loss 1.1288   LearningRate 0.0005   Epoch: 18   Global Step: 231100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:37:11,492-Speed 3040.69 samples/sec   Loss 1.0937   LearningRate 0.0005   Epoch: 18   Global Step: 231110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:37:14,841-Speed 3058.34 samples/sec   Loss 1.1306   LearningRate 0.0005   Epoch: 18   Global Step: 231120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:37:18,159-Speed 3086.60 samples/sec   Loss 1.1070   LearningRate 0.0005   Epoch: 18   Global Step: 231130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:37:21,482-Speed 3082.60 samples/sec   Loss 1.1195   LearningRate 0.0005   Epoch: 18   Global Step: 231140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:37:24,813-Speed 3075.11 samples/sec   Loss 1.0977   LearningRate 0.0005   Epoch: 18   Global Step: 231150   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:37:28,181-Speed 3040.68 samples/sec   Loss 1.1413   LearningRate 0.0005   Epoch: 18   Global Step: 231160   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:37:31,580-Speed 3014.21 samples/sec   Loss 1.0852   LearningRate 0.0005   Epoch: 18   Global Step: 231170   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:37:35,005-Speed 2990.61 samples/sec   Loss 1.1286   LearningRate 0.0005   Epoch: 18   Global Step: 231180   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:37:38,376-Speed 3038.42 samples/sec   Loss 1.1260   LearningRate 0.0005   Epoch: 18   Global Step: 231190   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:37:41,737-Speed 3047.08 samples/sec   Loss 1.1447   LearningRate 0.0005   Epoch: 18   Global Step: 231200   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:37:45,140-Speed 3009.97 samples/sec   Loss 1.1456   LearningRate 0.0005   Epoch: 18   Global Step: 231210   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:37:48,561-Speed 2994.59 samples/sec   Loss 1.1635   LearningRate 0.0005   Epoch: 18   Global Step: 231220   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:37:51,916-Speed 3052.65 samples/sec   Loss 1.1077   LearningRate 0.0005   Epoch: 18   Global Step: 231230   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:37:55,244-Speed 3078.40 samples/sec   Loss 1.0909   LearningRate 0.0005   Epoch: 18   Global Step: 231240   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 23:37:58,653-Speed 3004.10 samples/sec   Loss 1.0698   LearningRate 0.0005   Epoch: 18   Global Step: 231250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:02,116-Speed 2957.62 samples/sec   Loss 1.1024   LearningRate 0.0005   Epoch: 18   Global Step: 231260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:05,458-Speed 3065.72 samples/sec   Loss 1.1635   LearningRate 0.0005   Epoch: 18   Global Step: 231270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:08,871-Speed 3000.34 samples/sec   Loss 1.0874   LearningRate 0.0005   Epoch: 18   Global Step: 231280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:12,234-Speed 3046.18 samples/sec   Loss 1.0839   LearningRate 0.0005   Epoch: 18   Global Step: 231290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:15,562-Speed 3077.34 samples/sec   Loss 1.1152   LearningRate 0.0005   Epoch: 18   Global Step: 231300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:18,976-Speed 3000.00 samples/sec   Loss 1.0885   LearningRate 0.0005   Epoch: 18   Global Step: 231310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:22,414-Speed 2979.52 samples/sec   Loss 1.0975   LearningRate 0.0005   Epoch: 18   Global Step: 231320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:25,755-Speed 3065.91 samples/sec   Loss 1.1079   LearningRate 0.0005   Epoch: 18   Global Step: 231330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:29,077-Speed 3083.00 samples/sec   Loss 1.1313   LearningRate 0.0005   Epoch: 18   Global Step: 231340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:32,488-Speed 3002.70 samples/sec   Loss 1.0749   LearningRate 0.0005   Epoch: 18   Global Step: 231350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:38:35,905-Speed 2997.82 samples/sec   Loss 1.1093   LearningRate 0.0005   Epoch: 18   Global Step: 231360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:38:39,348-Speed 2975.10 samples/sec   Loss 1.0744   LearningRate 0.0005   Epoch: 18   Global Step: 231370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:38:42,719-Speed 3038.79 samples/sec   Loss 1.1151   LearningRate 0.0005   Epoch: 18   Global Step: 231380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:46,088-Speed 3040.10 samples/sec   Loss 1.0993   LearningRate 0.0005   Epoch: 18   Global Step: 231390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:49,472-Speed 3027.51 samples/sec   Loss 1.1258   LearningRate 0.0005   Epoch: 18   Global Step: 231400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:52,789-Speed 3088.01 samples/sec   Loss 1.1242   LearningRate 0.0005   Epoch: 18   Global Step: 231410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:56,187-Speed 3014.35 samples/sec   Loss 1.0770   LearningRate 0.0005   Epoch: 18   Global Step: 231420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:38:59,578-Speed 3020.68 samples/sec   Loss 1.0658   LearningRate 0.0005   Epoch: 18   Global Step: 231430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:39:02,947-Speed 3040.20 samples/sec   Loss 1.1492   LearningRate 0.0005   Epoch: 18   Global Step: 231440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:39:06,321-Speed 3036.22 samples/sec   Loss 1.1293   LearningRate 0.0005   Epoch: 18   Global Step: 231450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:39:09,669-Speed 3058.86 samples/sec   Loss 1.1430   LearningRate 0.0005   Epoch: 18   Global Step: 231460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:39:12,996-Speed 3078.37 samples/sec   Loss 1.1131   LearningRate 0.0005   Epoch: 18   Global Step: 231470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:39:16,389-Speed 3019.19 samples/sec   Loss 1.1050   LearningRate 0.0005   Epoch: 18   Global Step: 231480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:39:19,721-Speed 3074.39 samples/sec   Loss 1.0875   LearningRate 0.0005   Epoch: 18   Global Step: 231490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:39:23,082-Speed 3047.00 samples/sec   Loss 1.1276   LearningRate 0.0005   Epoch: 18   Global Step: 231500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:39:26,515-Speed 2983.80 samples/sec   Loss 1.1008   LearningRate 0.0005   Epoch: 18   Global Step: 231510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:39:29,941-Speed 2989.51 samples/sec   Loss 1.1352   LearningRate 0.0005   Epoch: 18   Global Step: 231520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:39:33,275-Speed 3072.39 samples/sec   Loss 1.1147   LearningRate 0.0005   Epoch: 18   Global Step: 231530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:39:36,681-Speed 3007.22 samples/sec   Loss 1.0977   LearningRate 0.0005   Epoch: 18   Global Step: 231540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:39:40,063-Speed 3028.66 samples/sec   Loss 1.1699   LearningRate 0.0005   Epoch: 18   Global Step: 231550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:39:43,475-Speed 3002.15 samples/sec   Loss 1.1091   LearningRate 0.0005   Epoch: 18   Global Step: 231560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:39:46,841-Speed 3042.75 samples/sec   Loss 1.0720   LearningRate 0.0005   Epoch: 18   Global Step: 231570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:39:50,212-Speed 3038.47 samples/sec   Loss 1.0955   LearningRate 0.0005   Epoch: 18   Global Step: 231580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:39:53,548-Speed 3070.14 samples/sec   Loss 1.1704   LearningRate 0.0005   Epoch: 18   Global Step: 231590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:39:56,928-Speed 3031.00 samples/sec   Loss 1.0992   LearningRate 0.0005   Epoch: 18   Global Step: 231600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:40:00,306-Speed 3032.09 samples/sec   Loss 1.0959   LearningRate 0.0005   Epoch: 18   Global Step: 231610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:40:03,715-Speed 3005.33 samples/sec   Loss 1.1071   LearningRate 0.0005   Epoch: 18   Global Step: 231620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:40:07,089-Speed 3035.85 samples/sec   Loss 1.1036   LearningRate 0.0005   Epoch: 18   Global Step: 231630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:40:10,453-Speed 3044.56 samples/sec   Loss 1.1257   LearningRate 0.0005   Epoch: 18   Global Step: 231640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:40:13,794-Speed 3066.18 samples/sec   Loss 1.0838   LearningRate 0.0005   Epoch: 18   Global Step: 231650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:40:17,143-Speed 3058.27 samples/sec   Loss 1.1323   LearningRate 0.0005   Epoch: 18   Global Step: 231660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:40:20,491-Speed 3059.79 samples/sec   Loss 1.0933   LearningRate 0.0005   Epoch: 18   Global Step: 231670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:40:23,852-Speed 3047.81 samples/sec   Loss 1.1274   LearningRate 0.0005   Epoch: 18   Global Step: 231680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:40:27,185-Speed 3073.27 samples/sec   Loss 1.1455   LearningRate 0.0005   Epoch: 18   Global Step: 231690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:40:30,572-Speed 3024.19 samples/sec   Loss 1.1281   LearningRate 0.0005   Epoch: 18   Global Step: 231700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:40:33,892-Speed 3084.84 samples/sec   Loss 1.1282   LearningRate 0.0005   Epoch: 18   Global Step: 231710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:40:37,240-Speed 3059.53 samples/sec   Loss 1.0751   LearningRate 0.0005   Epoch: 18   Global Step: 231720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:40:40,605-Speed 3044.12 samples/sec   Loss 1.1226   LearningRate 0.0005   Epoch: 18   Global Step: 231730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:40:43,951-Speed 3061.10 samples/sec   Loss 1.1138   LearningRate 0.0005   Epoch: 18   Global Step: 231740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:40:47,325-Speed 3036.25 samples/sec   Loss 1.0861   LearningRate 0.0005   Epoch: 18   Global Step: 231750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:40:50,788-Speed 2957.46 samples/sec   Loss 1.0972   LearningRate 0.0004   Epoch: 18   Global Step: 231760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:40:54,112-Speed 3081.99 samples/sec   Loss 1.1306   LearningRate 0.0004   Epoch: 18   Global Step: 231770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:40:57,460-Speed 3059.75 samples/sec   Loss 1.1659   LearningRate 0.0004   Epoch: 18   Global Step: 231780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:00,883-Speed 2992.14 samples/sec   Loss 1.0832   LearningRate 0.0004   Epoch: 18   Global Step: 231790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:04,273-Speed 3020.81 samples/sec   Loss 1.0773   LearningRate 0.0004   Epoch: 18   Global Step: 231800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:07,606-Speed 3073.51 samples/sec   Loss 1.1564   LearningRate 0.0004   Epoch: 18   Global Step: 231810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:11,060-Speed 2965.18 samples/sec   Loss 1.0861   LearningRate 0.0004   Epoch: 18   Global Step: 231820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:14,471-Speed 3002.76 samples/sec   Loss 1.1108   LearningRate 0.0004   Epoch: 18   Global Step: 231830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:17,842-Speed 3038.43 samples/sec   Loss 1.1169   LearningRate 0.0004   Epoch: 18   Global Step: 231840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:21,183-Speed 3066.28 samples/sec   Loss 1.0932   LearningRate 0.0004   Epoch: 18   Global Step: 231850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:24,558-Speed 3035.09 samples/sec   Loss 1.1002   LearningRate 0.0004   Epoch: 18   Global Step: 231860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:27,897-Speed 3067.50 samples/sec   Loss 1.1073   LearningRate 0.0004   Epoch: 18   Global Step: 231870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:31,229-Speed 3073.83 samples/sec   Loss 1.1336   LearningRate 0.0004   Epoch: 18   Global Step: 231880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:34,647-Speed 2997.08 samples/sec   Loss 1.1271   LearningRate 0.0004   Epoch: 18   Global Step: 231890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:38,012-Speed 3043.98 samples/sec   Loss 1.0845   LearningRate 0.0004   Epoch: 18   Global Step: 231900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:41,440-Speed 2987.47 samples/sec   Loss 1.0653   LearningRate 0.0004   Epoch: 18   Global Step: 231910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:44,855-Speed 2999.39 samples/sec   Loss 1.1458   LearningRate 0.0004   Epoch: 18   Global Step: 231920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:48,269-Speed 3000.61 samples/sec   Loss 1.1028   LearningRate 0.0004   Epoch: 18   Global Step: 231930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:41:51,605-Speed 3070.07 samples/sec   Loss 1.0987   LearningRate 0.0004   Epoch: 18   Global Step: 231940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:41:54,926-Speed 3083.82 samples/sec   Loss 1.1257   LearningRate 0.0004   Epoch: 18   Global Step: 231950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:41:58,275-Speed 3058.68 samples/sec   Loss 1.0965   LearningRate 0.0004   Epoch: 18   Global Step: 231960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:42:01,603-Speed 3078.47 samples/sec   Loss 1.1405   LearningRate 0.0004   Epoch: 18   Global Step: 231970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:42:04,970-Speed 3042.18 samples/sec   Loss 1.1440   LearningRate 0.0004   Epoch: 18   Global Step: 231980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:42:08,304-Speed 3072.11 samples/sec   Loss 1.1137   LearningRate 0.0004   Epoch: 18   Global Step: 231990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:42:11,724-Speed 2994.81 samples/sec   Loss 1.0821   LearningRate 0.0004   Epoch: 18   Global Step: 232000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:42:15,071-Speed 3059.72 samples/sec   Loss 1.0823   LearningRate 0.0004   Epoch: 18   Global Step: 232010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:42:18,486-Speed 2999.33 samples/sec   Loss 1.1180   LearningRate 0.0004   Epoch: 18   Global Step: 232020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:42:21,882-Speed 3016.58 samples/sec   Loss 1.1053   LearningRate 0.0004   Epoch: 18   Global Step: 232030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:42:25,216-Speed 3072.03 samples/sec   Loss 1.1141   LearningRate 0.0004   Epoch: 18   Global Step: 232040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:42:28,680-Speed 2956.80 samples/sec   Loss 1.1305   LearningRate 0.0004   Epoch: 18   Global Step: 232050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:42:32,097-Speed 2997.21 samples/sec   Loss 1.1234   LearningRate 0.0004   Epoch: 18   Global Step: 232060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:42:35,478-Speed 3029.48 samples/sec   Loss 1.1513   LearningRate 0.0004   Epoch: 18   Global Step: 232070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:42:38,852-Speed 3035.93 samples/sec   Loss 1.1490   LearningRate 0.0004   Epoch: 18   Global Step: 232080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:42:42,194-Speed 3065.19 samples/sec   Loss 1.0935   LearningRate 0.0004   Epoch: 18   Global Step: 232090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:42:45,631-Speed 2980.37 samples/sec   Loss 1.1058   LearningRate 0.0004   Epoch: 18   Global Step: 232100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:42:49,006-Speed 3034.40 samples/sec   Loss 1.0994   LearningRate 0.0004   Epoch: 18   Global Step: 232110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:42:52,404-Speed 3014.76 samples/sec   Loss 1.1162   LearningRate 0.0004   Epoch: 18   Global Step: 232120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:42:55,737-Speed 3073.09 samples/sec   Loss 1.1438   LearningRate 0.0004   Epoch: 18   Global Step: 232130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:42:59,087-Speed 3056.95 samples/sec   Loss 1.1233   LearningRate 0.0004   Epoch: 18   Global Step: 232140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:02,504-Speed 2998.07 samples/sec   Loss 1.1409   LearningRate 0.0004   Epoch: 18   Global Step: 232150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:05,871-Speed 3041.39 samples/sec   Loss 1.1038   LearningRate 0.0004   Epoch: 18   Global Step: 232160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:09,233-Speed 3047.07 samples/sec   Loss 1.0518   LearningRate 0.0004   Epoch: 18   Global Step: 232170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:12,574-Speed 3066.24 samples/sec   Loss 1.1126   LearningRate 0.0004   Epoch: 18   Global Step: 232180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:15,928-Speed 3054.02 samples/sec   Loss 1.1081   LearningRate 0.0004   Epoch: 18   Global Step: 232190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:43:19,340-Speed 3001.84 samples/sec   Loss 1.1710   LearningRate 0.0004   Epoch: 18   Global Step: 232200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:22,837-Speed 2929.37 samples/sec   Loss 1.0931   LearningRate 0.0004   Epoch: 18   Global Step: 232210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:26,186-Speed 3058.32 samples/sec   Loss 1.1140   LearningRate 0.0004   Epoch: 18   Global Step: 232220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:29,572-Speed 3024.71 samples/sec   Loss 1.0947   LearningRate 0.0004   Epoch: 18   Global Step: 232230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:32,979-Speed 3005.99 samples/sec   Loss 1.1098   LearningRate 0.0004   Epoch: 18   Global Step: 232240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:36,356-Speed 3034.06 samples/sec   Loss 1.0755   LearningRate 0.0004   Epoch: 18   Global Step: 232250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:39,811-Speed 2964.71 samples/sec   Loss 1.1416   LearningRate 0.0004   Epoch: 18   Global Step: 232260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:43,154-Speed 3063.73 samples/sec   Loss 1.0610   LearningRate 0.0004   Epoch: 18   Global Step: 232270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:46,536-Speed 3028.82 samples/sec   Loss 1.0849   LearningRate 0.0004   Epoch: 18   Global Step: 232280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:49,901-Speed 3044.21 samples/sec   Loss 1.1351   LearningRate 0.0004   Epoch: 18   Global Step: 232290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:43:53,308-Speed 3006.73 samples/sec   Loss 1.1704   LearningRate 0.0004   Epoch: 18   Global Step: 232300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:43:56,688-Speed 3029.65 samples/sec   Loss 1.0847   LearningRate 0.0004   Epoch: 18   Global Step: 232310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:44:00,101-Speed 3002.00 samples/sec   Loss 1.0956   LearningRate 0.0004   Epoch: 18   Global Step: 232320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:44:03,576-Speed 2947.52 samples/sec   Loss 1.1128   LearningRate 0.0004   Epoch: 18   Global Step: 232330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:44:06,977-Speed 3011.21 samples/sec   Loss 1.1440   LearningRate 0.0004   Epoch: 18   Global Step: 232340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:44:10,330-Speed 3055.48 samples/sec   Loss 1.0994   LearningRate 0.0004   Epoch: 18   Global Step: 232350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:44:13,765-Speed 2981.76 samples/sec   Loss 1.1625   LearningRate 0.0004   Epoch: 18   Global Step: 232360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:44:17,149-Speed 3026.23 samples/sec   Loss 1.0661   LearningRate 0.0004   Epoch: 18   Global Step: 232370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:44:20,495-Speed 3061.50 samples/sec   Loss 1.1167   LearningRate 0.0004   Epoch: 18   Global Step: 232380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:44:23,856-Speed 3047.48 samples/sec   Loss 1.0633   LearningRate 0.0004   Epoch: 18   Global Step: 232390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:44:27,240-Speed 3026.81 samples/sec   Loss 1.1286   LearningRate 0.0004   Epoch: 18   Global Step: 232400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:44:30,623-Speed 3028.27 samples/sec   Loss 1.1157   LearningRate 0.0004   Epoch: 18   Global Step: 232410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:44:33,998-Speed 3034.67 samples/sec   Loss 1.0985   LearningRate 0.0004   Epoch: 18   Global Step: 232420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:44:37,317-Speed 3085.48 samples/sec   Loss 1.1305   LearningRate 0.0004   Epoch: 18   Global Step: 232430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:44:40,657-Speed 3067.13 samples/sec   Loss 1.1296   LearningRate 0.0004   Epoch: 18   Global Step: 232440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:44:44,068-Speed 3002.65 samples/sec   Loss 1.1108   LearningRate 0.0004   Epoch: 18   Global Step: 232450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 23:44:47,388-Speed 3085.67 samples/sec   Loss 1.1151   LearningRate 0.0004   Epoch: 18   Global Step: 232460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 23:44:50,787-Speed 3013.10 samples/sec   Loss 1.1030   LearningRate 0.0004   Epoch: 18   Global Step: 232470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:44:54,205-Speed 2996.75 samples/sec   Loss 1.0769   LearningRate 0.0004   Epoch: 18   Global Step: 232480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:44:57,547-Speed 3065.03 samples/sec   Loss 1.1026   LearningRate 0.0004   Epoch: 18   Global Step: 232490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:45:00,974-Speed 2988.84 samples/sec   Loss 1.1625   LearningRate 0.0004   Epoch: 18   Global Step: 232500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:45:04,445-Speed 2950.66 samples/sec   Loss 1.0787   LearningRate 0.0004   Epoch: 18   Global Step: 232510   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:45:07,850-Speed 3008.08 samples/sec   Loss 1.0671   LearningRate 0.0004   Epoch: 18   Global Step: 232520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:45:11,269-Speed 2996.46 samples/sec   Loss 1.0639   LearningRate 0.0004   Epoch: 18   Global Step: 232530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:45:14,607-Speed 3067.95 samples/sec   Loss 1.1168   LearningRate 0.0004   Epoch: 18   Global Step: 232540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:45:18,022-Speed 2999.59 samples/sec   Loss 1.1275   LearningRate 0.0004   Epoch: 18   Global Step: 232550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 23:45:21,465-Speed 2974.38 samples/sec   Loss 1.1276   LearningRate 0.0004   Epoch: 18   Global Step: 232560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:45:24,786-Speed 3084.83 samples/sec   Loss 1.1136   LearningRate 0.0004   Epoch: 18   Global Step: 232570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:45:28,135-Speed 3058.63 samples/sec   Loss 1.0906   LearningRate 0.0004   Epoch: 18   Global Step: 232580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:45:31,552-Speed 2997.04 samples/sec   Loss 1.0937   LearningRate 0.0004   Epoch: 18   Global Step: 232590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:45:34,901-Speed 3059.16 samples/sec   Loss 1.1265   LearningRate 0.0004   Epoch: 18   Global Step: 232600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:45:38,280-Speed 3031.69 samples/sec   Loss 1.1137   LearningRate 0.0004   Epoch: 18   Global Step: 232610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:45:41,670-Speed 3021.28 samples/sec   Loss 1.1242   LearningRate 0.0004   Epoch: 18   Global Step: 232620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:45:45,042-Speed 3037.77 samples/sec   Loss 1.0927   LearningRate 0.0004   Epoch: 18   Global Step: 232630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:45:48,459-Speed 2997.83 samples/sec   Loss 1.1349   LearningRate 0.0004   Epoch: 18   Global Step: 232640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:45:51,849-Speed 3020.80 samples/sec   Loss 1.1545   LearningRate 0.0004   Epoch: 18   Global Step: 232650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:45:55,177-Speed 3077.83 samples/sec   Loss 1.1188   LearningRate 0.0004   Epoch: 18   Global Step: 232660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:45:58,520-Speed 3064.98 samples/sec   Loss 1.0955   LearningRate 0.0004   Epoch: 18   Global Step: 232670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:46:01,962-Speed 2975.19 samples/sec   Loss 1.0565   LearningRate 0.0004   Epoch: 18   Global Step: 232680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:46:05,357-Speed 3017.24 samples/sec   Loss 1.1019   LearningRate 0.0004   Epoch: 18   Global Step: 232690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:46:08,710-Speed 3054.20 samples/sec   Loss 1.0903   LearningRate 0.0004   Epoch: 18   Global Step: 232700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:46:12,064-Speed 3053.83 samples/sec   Loss 1.1018   LearningRate 0.0004   Epoch: 18   Global Step: 232710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:46:15,463-Speed 3013.36 samples/sec   Loss 1.1080   LearningRate 0.0004   Epoch: 18   Global Step: 232720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:46:18,796-Speed 3073.59 samples/sec   Loss 1.0852   LearningRate 0.0004   Epoch: 18   Global Step: 232730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:46:22,198-Speed 3010.68 samples/sec   Loss 1.0759   LearningRate 0.0004   Epoch: 18   Global Step: 232740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:46:25,685-Speed 2937.27 samples/sec   Loss 1.1114   LearningRate 0.0004   Epoch: 18   Global Step: 232750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:46:29,140-Speed 2965.17 samples/sec   Loss 1.0797   LearningRate 0.0004   Epoch: 18   Global Step: 232760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:46:32,585-Speed 2972.86 samples/sec   Loss 1.0911   LearningRate 0.0004   Epoch: 18   Global Step: 232770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:46:36,063-Speed 2945.10 samples/sec   Loss 1.1386   LearningRate 0.0004   Epoch: 18   Global Step: 232780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:46:39,442-Speed 3030.95 samples/sec   Loss 1.1324   LearningRate 0.0004   Epoch: 18   Global Step: 232790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:46:42,764-Speed 3083.16 samples/sec   Loss 1.1486   LearningRate 0.0004   Epoch: 18   Global Step: 232800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 23:46:46,141-Speed 3033.35 samples/sec   Loss 1.1341   LearningRate 0.0004   Epoch: 18   Global Step: 232810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:46:49,575-Speed 2982.41 samples/sec   Loss 1.1287   LearningRate 0.0004   Epoch: 18   Global Step: 232820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:46:52,921-Speed 3062.11 samples/sec   Loss 1.1269   LearningRate 0.0004   Epoch: 18   Global Step: 232830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:46:56,250-Speed 3076.39 samples/sec   Loss 1.1405   LearningRate 0.0004   Epoch: 18   Global Step: 232840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:46:59,588-Speed 3068.81 samples/sec   Loss 1.1209   LearningRate 0.0004   Epoch: 18   Global Step: 232850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:47:02,959-Speed 3038.57 samples/sec   Loss 1.1003   LearningRate 0.0004   Epoch: 18   Global Step: 232860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:47:06,267-Speed 3096.16 samples/sec   Loss 1.0940   LearningRate 0.0004   Epoch: 18   Global Step: 232870   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:47:09,603-Speed 3070.00 samples/sec   Loss 1.0955   LearningRate 0.0004   Epoch: 18   Global Step: 232880   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:47:12,964-Speed 3047.78 samples/sec   Loss 1.1767   LearningRate 0.0004   Epoch: 18   Global Step: 232890   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:47:16,340-Speed 3033.89 samples/sec   Loss 1.1566   LearningRate 0.0004   Epoch: 18   Global Step: 232900   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:47:19,775-Speed 2982.39 samples/sec   Loss 1.1039   LearningRate 0.0004   Epoch: 18   Global Step: 232910   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:47:23,244-Speed 2952.01 samples/sec   Loss 1.0785   LearningRate 0.0004   Epoch: 18   Global Step: 232920   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:47:26,623-Speed 3031.91 samples/sec   Loss 1.1032   LearningRate 0.0004   Epoch: 18   Global Step: 232930   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:47:30,032-Speed 3004.63 samples/sec   Loss 1.1490   LearningRate 0.0004   Epoch: 18   Global Step: 232940   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:47:33,404-Speed 3037.61 samples/sec   Loss 1.1050   LearningRate 0.0004   Epoch: 18   Global Step: 232950   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:47:36,746-Speed 3064.99 samples/sec   Loss 1.1026   LearningRate 0.0004   Epoch: 18   Global Step: 232960   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:47:40,144-Speed 3014.12 samples/sec   Loss 1.1157   LearningRate 0.0004   Epoch: 18   Global Step: 232970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:47:43,525-Speed 3030.09 samples/sec   Loss 1.0837   LearningRate 0.0004   Epoch: 18   Global Step: 232980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:47:46,885-Speed 3047.84 samples/sec   Loss 1.0817   LearningRate 0.0004   Epoch: 18   Global Step: 232990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:47:50,264-Speed 3031.69 samples/sec   Loss 1.1236   LearningRate 0.0004   Epoch: 18   Global Step: 233000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:47:53,650-Speed 3025.06 samples/sec   Loss 1.1537   LearningRate 0.0004   Epoch: 18   Global Step: 233010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:47:57,050-Speed 3012.21 samples/sec   Loss 1.1104   LearningRate 0.0004   Epoch: 18   Global Step: 233020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:48:00,432-Speed 3029.00 samples/sec   Loss 1.1112   LearningRate 0.0004   Epoch: 18   Global Step: 233030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:48:03,758-Speed 3079.84 samples/sec   Loss 1.1154   LearningRate 0.0004   Epoch: 18   Global Step: 233040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:48:07,191-Speed 2982.67 samples/sec   Loss 1.0915   LearningRate 0.0004   Epoch: 18   Global Step: 233050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:48:10,603-Speed 3001.97 samples/sec   Loss 1.1396   LearningRate 0.0004   Epoch: 18   Global Step: 233060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:48:14,036-Speed 2984.07 samples/sec   Loss 1.1259   LearningRate 0.0004   Epoch: 18   Global Step: 233070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:48:17,382-Speed 3061.49 samples/sec   Loss 1.1634   LearningRate 0.0004   Epoch: 18   Global Step: 233080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:48:20,744-Speed 3045.58 samples/sec   Loss 1.1467   LearningRate 0.0004   Epoch: 18   Global Step: 233090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:48:24,129-Speed 3026.25 samples/sec   Loss 1.0943   LearningRate 0.0004   Epoch: 18   Global Step: 233100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:48:27,507-Speed 3032.38 samples/sec   Loss 1.1293   LearningRate 0.0004   Epoch: 18   Global Step: 233110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:48:30,936-Speed 2986.63 samples/sec   Loss 1.0762   LearningRate 0.0004   Epoch: 18   Global Step: 233120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:48:34,267-Speed 3076.07 samples/sec   Loss 1.1205   LearningRate 0.0004   Epoch: 18   Global Step: 233130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:48:37,644-Speed 3032.92 samples/sec   Loss 1.1234   LearningRate 0.0004   Epoch: 18   Global Step: 233140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:48:41,078-Speed 2982.68 samples/sec   Loss 1.1322   LearningRate 0.0004   Epoch: 18   Global Step: 233150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:48:44,432-Speed 3053.37 samples/sec   Loss 1.1081   LearningRate 0.0004   Epoch: 18   Global Step: 233160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:48:47,761-Speed 3076.64 samples/sec   Loss 1.0913   LearningRate 0.0004   Epoch: 18   Global Step: 233170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 23:48:51,129-Speed 3041.97 samples/sec   Loss 1.0899   LearningRate 0.0004   Epoch: 18   Global Step: 233180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 23:48:54,553-Speed 2990.79 samples/sec   Loss 1.0886   LearningRate 0.0004   Epoch: 18   Global Step: 233190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 23:48:57,942-Speed 3022.47 samples/sec   Loss 1.1217   LearningRate 0.0004   Epoch: 18   Global Step: 233200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:01,280-Speed 3069.35 samples/sec   Loss 1.1120   LearningRate 0.0004   Epoch: 18   Global Step: 233210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:04,624-Speed 3062.75 samples/sec   Loss 1.1583   LearningRate 0.0004   Epoch: 18   Global Step: 233220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:08,008-Speed 3026.94 samples/sec   Loss 1.0680   LearningRate 0.0004   Epoch: 18   Global Step: 233230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:11,391-Speed 3027.78 samples/sec   Loss 1.1455   LearningRate 0.0004   Epoch: 18   Global Step: 233240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:14,763-Speed 3037.14 samples/sec   Loss 1.1094   LearningRate 0.0004   Epoch: 18   Global Step: 233250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:18,108-Speed 3063.55 samples/sec   Loss 1.0895   LearningRate 0.0004   Epoch: 18   Global Step: 233260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:21,473-Speed 3044.09 samples/sec   Loss 1.0871   LearningRate 0.0004   Epoch: 18   Global Step: 233270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:24,886-Speed 3000.26 samples/sec   Loss 1.0924   LearningRate 0.0004   Epoch: 18   Global Step: 233280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:28,244-Speed 3050.74 samples/sec   Loss 1.1096   LearningRate 0.0004   Epoch: 18   Global Step: 233290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:31,637-Speed 3019.36 samples/sec   Loss 1.1094   LearningRate 0.0004   Epoch: 18   Global Step: 233300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 23:49:35,027-Speed 3021.04 samples/sec   Loss 1.0998   LearningRate 0.0004   Epoch: 18   Global Step: 233310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:38,450-Speed 2992.35 samples/sec   Loss 1.1556   LearningRate 0.0004   Epoch: 18   Global Step: 233320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:41,845-Speed 3017.28 samples/sec   Loss 1.1353   LearningRate 0.0004   Epoch: 18   Global Step: 233330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:45,196-Speed 3056.36 samples/sec   Loss 1.1240   LearningRate 0.0004   Epoch: 18   Global Step: 233340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:48,594-Speed 3014.94 samples/sec   Loss 1.1502   LearningRate 0.0004   Epoch: 18   Global Step: 233350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:51,993-Speed 3013.04 samples/sec   Loss 1.1014   LearningRate 0.0004   Epoch: 18   Global Step: 233360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:49:55,368-Speed 3035.38 samples/sec   Loss 1.1252   LearningRate 0.0004   Epoch: 18   Global Step: 233370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:49:58,750-Speed 3028.39 samples/sec   Loss 1.0914   LearningRate 0.0004   Epoch: 18   Global Step: 233380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:50:02,088-Speed 3069.05 samples/sec   Loss 1.0640   LearningRate 0.0004   Epoch: 18   Global Step: 233390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:50:05,438-Speed 3057.80 samples/sec   Loss 1.1585   LearningRate 0.0004   Epoch: 18   Global Step: 233400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:50:08,832-Speed 3018.06 samples/sec   Loss 1.1556   LearningRate 0.0004   Epoch: 18   Global Step: 233410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:50:12,274-Speed 2974.98 samples/sec   Loss 1.0927   LearningRate 0.0004   Epoch: 18   Global Step: 233420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:50:15,708-Speed 2982.91 samples/sec   Loss 1.1278   LearningRate 0.0004   Epoch: 18   Global Step: 233430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:50:19,098-Speed 3021.90 samples/sec   Loss 1.1741   LearningRate 0.0004   Epoch: 18   Global Step: 233440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:50:22,454-Speed 3052.30 samples/sec   Loss 1.1200   LearningRate 0.0004   Epoch: 18   Global Step: 233450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:50:25,782-Speed 3077.20 samples/sec   Loss 1.1137   LearningRate 0.0004   Epoch: 18   Global Step: 233460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:50:29,233-Speed 2968.75 samples/sec   Loss 1.0612   LearningRate 0.0004   Epoch: 18   Global Step: 233470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:50:32,688-Speed 2964.74 samples/sec   Loss 1.0706   LearningRate 0.0004   Epoch: 18   Global Step: 233480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:50:36,103-Speed 2999.39 samples/sec   Loss 1.1334   LearningRate 0.0004   Epoch: 18   Global Step: 233490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:50:39,507-Speed 3008.49 samples/sec   Loss 1.1172   LearningRate 0.0004   Epoch: 18   Global Step: 233500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:50:42,914-Speed 3006.13 samples/sec   Loss 1.0731   LearningRate 0.0004   Epoch: 18   Global Step: 233510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:50:46,215-Speed 3102.85 samples/sec   Loss 1.1278   LearningRate 0.0004   Epoch: 18   Global Step: 233520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:50:49,631-Speed 2998.92 samples/sec   Loss 1.1062   LearningRate 0.0004   Epoch: 18   Global Step: 233530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:50:52,965-Speed 3072.83 samples/sec   Loss 1.0712   LearningRate 0.0004   Epoch: 18   Global Step: 233540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:50:56,456-Speed 2933.34 samples/sec   Loss 1.1530   LearningRate 0.0004   Epoch: 18   Global Step: 233550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:50:59,877-Speed 2994.10 samples/sec   Loss 1.0934   LearningRate 0.0004   Epoch: 18   Global Step: 233560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:51:03,238-Speed 3047.87 samples/sec   Loss 1.0545   LearningRate 0.0004   Epoch: 18   Global Step: 233570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:51:06,647-Speed 3004.34 samples/sec   Loss 1.1491   LearningRate 0.0004   Epoch: 18   Global Step: 233580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:51:09,960-Speed 3092.46 samples/sec   Loss 1.1200   LearningRate 0.0004   Epoch: 18   Global Step: 233590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:51:13,436-Speed 2946.63 samples/sec   Loss 1.1260   LearningRate 0.0004   Epoch: 18   Global Step: 233600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:51:16,912-Speed 2946.76 samples/sec   Loss 1.0965   LearningRate 0.0004   Epoch: 18   Global Step: 233610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:51:20,341-Speed 2986.28 samples/sec   Loss 1.1226   LearningRate 0.0004   Epoch: 18   Global Step: 233620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:51:23,691-Speed 3058.34 samples/sec   Loss 1.1079   LearningRate 0.0004   Epoch: 18   Global Step: 233630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:51:27,107-Speed 2998.12 samples/sec   Loss 1.0510   LearningRate 0.0004   Epoch: 18   Global Step: 233640   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:51:30,422-Speed 3090.12 samples/sec   Loss 1.1040   LearningRate 0.0004   Epoch: 18   Global Step: 233650   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:51:33,777-Speed 3053.01 samples/sec   Loss 1.1292   LearningRate 0.0004   Epoch: 18   Global Step: 233660   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:51:37,108-Speed 3074.71 samples/sec   Loss 1.1242   LearningRate 0.0004   Epoch: 18   Global Step: 233670   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:51:40,508-Speed 3012.77 samples/sec   Loss 1.1122   LearningRate 0.0004   Epoch: 18   Global Step: 233680   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:51:43,940-Speed 2984.68 samples/sec   Loss 1.0940   LearningRate 0.0004   Epoch: 18   Global Step: 233690   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:51:47,334-Speed 3017.97 samples/sec   Loss 1.1365   LearningRate 0.0004   Epoch: 18   Global Step: 233700   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:51:50,762-Speed 2988.15 samples/sec   Loss 1.1369   LearningRate 0.0004   Epoch: 18   Global Step: 233710   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:51:54,157-Speed 3016.34 samples/sec   Loss 1.1296   LearningRate 0.0004   Epoch: 18   Global Step: 233720   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:51:57,505-Speed 3060.07 samples/sec   Loss 1.1649   LearningRate 0.0003   Epoch: 18   Global Step: 233730   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 23:52:00,895-Speed 3021.01 samples/sec   Loss 1.1210   LearningRate 0.0003   Epoch: 18   Global Step: 233740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:52:04,353-Speed 2962.22 samples/sec   Loss 1.1133   LearningRate 0.0003   Epoch: 18   Global Step: 233750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:52:07,686-Speed 3072.68 samples/sec   Loss 1.0929   LearningRate 0.0003   Epoch: 18   Global Step: 233760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:52:11,105-Speed 2996.27 samples/sec   Loss 1.1254   LearningRate 0.0003   Epoch: 18   Global Step: 233770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:52:14,443-Speed 3068.87 samples/sec   Loss 1.0982   LearningRate 0.0003   Epoch: 18   Global Step: 233780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:52:17,777-Speed 3072.30 samples/sec   Loss 1.1745   LearningRate 0.0003   Epoch: 18   Global Step: 233790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:52:21,103-Speed 3079.33 samples/sec   Loss 1.1273   LearningRate 0.0003   Epoch: 18   Global Step: 233800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:52:24,491-Speed 3022.82 samples/sec   Loss 1.1219   LearningRate 0.0003   Epoch: 18   Global Step: 233810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:52:27,941-Speed 2968.78 samples/sec   Loss 1.1361   LearningRate 0.0003   Epoch: 18   Global Step: 233820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:52:31,317-Speed 3034.10 samples/sec   Loss 1.1356   LearningRate 0.0003   Epoch: 18   Global Step: 233830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:52:34,722-Speed 3008.92 samples/sec   Loss 1.0817   LearningRate 0.0003   Epoch: 18   Global Step: 233840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:52:38,083-Speed 3047.23 samples/sec   Loss 1.1541   LearningRate 0.0003   Epoch: 18   Global Step: 233850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:52:41,458-Speed 3034.77 samples/sec   Loss 1.1491   LearningRate 0.0003   Epoch: 18   Global Step: 233860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:52:44,808-Speed 3057.50 samples/sec   Loss 1.0617   LearningRate 0.0003   Epoch: 18   Global Step: 233870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:52:48,193-Speed 3025.86 samples/sec   Loss 1.1556   LearningRate 0.0003   Epoch: 18   Global Step: 233880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:52:51,549-Speed 3052.76 samples/sec   Loss 1.1064   LearningRate 0.0003   Epoch: 18   Global Step: 233890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:52:54,959-Speed 3003.22 samples/sec   Loss 1.1400   LearningRate 0.0003   Epoch: 18   Global Step: 233900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:52:58,359-Speed 3012.43 samples/sec   Loss 1.1404   LearningRate 0.0003   Epoch: 18   Global Step: 233910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:53:01,747-Speed 3023.38 samples/sec   Loss 1.1279   LearningRate 0.0003   Epoch: 18   Global Step: 233920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:53:05,111-Speed 3045.14 samples/sec   Loss 1.1050   LearningRate 0.0003   Epoch: 18   Global Step: 233930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:53:08,480-Speed 3040.12 samples/sec   Loss 1.1243   LearningRate 0.0003   Epoch: 18   Global Step: 233940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:53:11,847-Speed 3043.06 samples/sec   Loss 1.0889   LearningRate 0.0003   Epoch: 18   Global Step: 233950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:53:15,213-Speed 3042.26 samples/sec   Loss 1.0846   LearningRate 0.0003   Epoch: 18   Global Step: 233960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:53:18,551-Speed 3068.57 samples/sec   Loss 1.0774   LearningRate 0.0003   Epoch: 18   Global Step: 233970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:53:21,950-Speed 3014.74 samples/sec   Loss 1.1160   LearningRate 0.0003   Epoch: 18   Global Step: 233980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:53:25,351-Speed 3011.70 samples/sec   Loss 1.0989   LearningRate 0.0003   Epoch: 18   Global Step: 233990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:53:28,738-Speed 3023.67 samples/sec   Loss 1.1028   LearningRate 0.0003   Epoch: 18   Global Step: 234000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:53:32,185-Speed 2971.59 samples/sec   Loss 1.1015   LearningRate 0.0003   Epoch: 18   Global Step: 234010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:53:35,671-Speed 2938.25 samples/sec   Loss 1.1469   LearningRate 0.0003   Epoch: 18   Global Step: 234020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:53:39,028-Speed 3051.35 samples/sec   Loss 1.0982   LearningRate 0.0003   Epoch: 18   Global Step: 234030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:53:42,373-Speed 3061.91 samples/sec   Loss 1.0886   LearningRate 0.0003   Epoch: 18   Global Step: 234040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:53:45,757-Speed 3027.06 samples/sec   Loss 1.1170   LearningRate 0.0003   Epoch: 18   Global Step: 234050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:53:49,159-Speed 3010.49 samples/sec   Loss 1.1166   LearningRate 0.0003   Epoch: 18   Global Step: 234060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:53:52,570-Speed 3003.76 samples/sec   Loss 1.1293   LearningRate 0.0003   Epoch: 18   Global Step: 234070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:53:55,914-Speed 3062.27 samples/sec   Loss 1.0785   LearningRate 0.0003   Epoch: 18   Global Step: 234080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:53:59,311-Speed 3015.50 samples/sec   Loss 1.1325   LearningRate 0.0003   Epoch: 18   Global Step: 234090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:02,672-Speed 3048.11 samples/sec   Loss 1.0779   LearningRate 0.0003   Epoch: 18   Global Step: 234100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:06,019-Speed 3059.67 samples/sec   Loss 1.1027   LearningRate 0.0003   Epoch: 18   Global Step: 234110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:09,325-Speed 3102.00 samples/sec   Loss 1.1151   LearningRate 0.0003   Epoch: 18   Global Step: 234120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:12,745-Speed 2995.01 samples/sec   Loss 1.0904   LearningRate 0.0003   Epoch: 18   Global Step: 234130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:16,108-Speed 3045.79 samples/sec   Loss 1.1056   LearningRate 0.0003   Epoch: 18   Global Step: 234140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:19,551-Speed 2975.05 samples/sec   Loss 1.1734   LearningRate 0.0003   Epoch: 18   Global Step: 234150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:23,023-Speed 2950.24 samples/sec   Loss 1.1021   LearningRate 0.0003   Epoch: 18   Global Step: 234160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:26,494-Speed 2950.21 samples/sec   Loss 1.1227   LearningRate 0.0003   Epoch: 18   Global Step: 234170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:29,921-Speed 2989.63 samples/sec   Loss 1.1113   LearningRate 0.0003   Epoch: 18   Global Step: 234180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:33,321-Speed 3012.59 samples/sec   Loss 1.1265   LearningRate 0.0003   Epoch: 18   Global Step: 234190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:36,666-Speed 3061.92 samples/sec   Loss 1.1298   LearningRate 0.0003   Epoch: 18   Global Step: 234200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:40,111-Speed 2974.18 samples/sec   Loss 1.1550   LearningRate 0.0003   Epoch: 18   Global Step: 234210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:43,542-Speed 2985.01 samples/sec   Loss 1.1246   LearningRate 0.0003   Epoch: 18   Global Step: 234220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:47,017-Speed 2947.43 samples/sec   Loss 1.0807   LearningRate 0.0003   Epoch: 18   Global Step: 234230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:50,453-Speed 2980.65 samples/sec   Loss 1.0627   LearningRate 0.0003   Epoch: 18   Global Step: 234240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:54:53,891-Speed 2979.30 samples/sec   Loss 1.1195   LearningRate 0.0003   Epoch: 18   Global Step: 234250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 23:54:57,323-Speed 2984.83 samples/sec   Loss 1.1102   LearningRate 0.0003   Epoch: 18   Global Step: 234260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:55:00,698-Speed 3035.19 samples/sec   Loss 1.1322   LearningRate 0.0003   Epoch: 18   Global Step: 234270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:55:04,057-Speed 3049.02 samples/sec   Loss 1.1286   LearningRate 0.0003   Epoch: 18   Global Step: 234280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:55:07,456-Speed 3013.82 samples/sec   Loss 1.1137   LearningRate 0.0003   Epoch: 18   Global Step: 234290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:55:10,856-Speed 3014.52 samples/sec   Loss 1.1060   LearningRate 0.0003   Epoch: 18   Global Step: 234300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:55:14,224-Speed 3041.11 samples/sec   Loss 1.0929   LearningRate 0.0003   Epoch: 18   Global Step: 234310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:55:17,663-Speed 2978.23 samples/sec   Loss 1.1199   LearningRate 0.0003   Epoch: 18   Global Step: 234320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:55:21,088-Speed 2990.87 samples/sec   Loss 1.0983   LearningRate 0.0003   Epoch: 18   Global Step: 234330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:55:24,480-Speed 3019.46 samples/sec   Loss 1.0978   LearningRate 0.0003   Epoch: 18   Global Step: 234340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:55:27,828-Speed 3059.37 samples/sec   Loss 1.0982   LearningRate 0.0003   Epoch: 18   Global Step: 234350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:55:31,274-Speed 2972.55 samples/sec   Loss 1.0780   LearningRate 0.0003   Epoch: 18   Global Step: 234360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:55:34,605-Speed 3074.84 samples/sec   Loss 1.1012   LearningRate 0.0003   Epoch: 18   Global Step: 234370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:55:37,979-Speed 3036.00 samples/sec   Loss 1.1115   LearningRate 0.0003   Epoch: 18   Global Step: 234380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:55:41,364-Speed 3026.32 samples/sec   Loss 1.1302   LearningRate 0.0003   Epoch: 18   Global Step: 234390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:55:44,791-Speed 2988.37 samples/sec   Loss 1.0758   LearningRate 0.0003   Epoch: 18   Global Step: 234400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:55:48,216-Speed 2990.47 samples/sec   Loss 1.1587   LearningRate 0.0003   Epoch: 18   Global Step: 234410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:55:51,528-Speed 3093.02 samples/sec   Loss 1.1109   LearningRate 0.0003   Epoch: 18   Global Step: 234420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:55:54,900-Speed 3037.62 samples/sec   Loss 1.1165   LearningRate 0.0003   Epoch: 18   Global Step: 234430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:55:58,275-Speed 3034.70 samples/sec   Loss 1.1842   LearningRate 0.0003   Epoch: 18   Global Step: 234440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:56:01,598-Speed 3082.89 samples/sec   Loss 1.1231   LearningRate 0.0003   Epoch: 18   Global Step: 234450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:56:04,934-Speed 3069.86 samples/sec   Loss 1.1399   LearningRate 0.0003   Epoch: 18   Global Step: 234460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:56:08,351-Speed 2997.84 samples/sec   Loss 1.0912   LearningRate 0.0003   Epoch: 18   Global Step: 234470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:56:11,736-Speed 3026.07 samples/sec   Loss 1.1292   LearningRate 0.0003   Epoch: 18   Global Step: 234480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:56:15,174-Speed 2979.01 samples/sec   Loss 1.0769   LearningRate 0.0003   Epoch: 18   Global Step: 234490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:56:18,571-Speed 3015.01 samples/sec   Loss 1.1027   LearningRate 0.0003   Epoch: 18   Global Step: 234500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:56:21,960-Speed 3023.25 samples/sec   Loss 1.0790   LearningRate 0.0003   Epoch: 18   Global Step: 234510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:56:25,307-Speed 3060.41 samples/sec   Loss 1.1109   LearningRate 0.0003   Epoch: 18   Global Step: 234520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:56:28,760-Speed 2966.60 samples/sec   Loss 1.1050   LearningRate 0.0003   Epoch: 18   Global Step: 234530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:56:32,269-Speed 2919.69 samples/sec   Loss 1.1355   LearningRate 0.0003   Epoch: 18   Global Step: 234540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:56:35,618-Speed 3058.70 samples/sec   Loss 1.0854   LearningRate 0.0003   Epoch: 18   Global Step: 234550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:56:38,987-Speed 3040.77 samples/sec   Loss 1.1295   LearningRate 0.0003   Epoch: 18   Global Step: 234560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:56:42,401-Speed 3000.08 samples/sec   Loss 1.1133   LearningRate 0.0003   Epoch: 18   Global Step: 234570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:56:45,843-Speed 2976.40 samples/sec   Loss 1.1286   LearningRate 0.0003   Epoch: 18   Global Step: 234580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:56:49,257-Speed 3000.37 samples/sec   Loss 1.1173   LearningRate 0.0003   Epoch: 18   Global Step: 234590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:56:52,685-Speed 2987.43 samples/sec   Loss 1.1049   LearningRate 0.0003   Epoch: 18   Global Step: 234600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:56:56,028-Speed 3064.26 samples/sec   Loss 1.0980   LearningRate 0.0003   Epoch: 18   Global Step: 234610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:56:59,380-Speed 3055.78 samples/sec   Loss 1.1199   LearningRate 0.0003   Epoch: 18   Global Step: 234620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:57:02,751-Speed 3038.64 samples/sec   Loss 1.1361   LearningRate 0.0003   Epoch: 18   Global Step: 234630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:57:06,104-Speed 3054.28 samples/sec   Loss 1.1024   LearningRate 0.0003   Epoch: 18   Global Step: 234640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:57:09,564-Speed 2960.96 samples/sec   Loss 1.1046   LearningRate 0.0003   Epoch: 18   Global Step: 234650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:57:12,937-Speed 3036.82 samples/sec   Loss 1.1091   LearningRate 0.0003   Epoch: 18   Global Step: 234660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:57:16,342-Speed 3007.89 samples/sec   Loss 1.0724   LearningRate 0.0003   Epoch: 18   Global Step: 234670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:57:19,726-Speed 3026.78 samples/sec   Loss 1.1249   LearningRate 0.0003   Epoch: 18   Global Step: 234680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:57:23,136-Speed 3003.38 samples/sec   Loss 1.0815   LearningRate 0.0003   Epoch: 18   Global Step: 234690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:57:26,607-Speed 2951.12 samples/sec   Loss 1.1293   LearningRate 0.0003   Epoch: 18   Global Step: 234700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:57:30,021-Speed 3000.36 samples/sec   Loss 1.1406   LearningRate 0.0003   Epoch: 18   Global Step: 234710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:57:33,435-Speed 3000.08 samples/sec   Loss 1.1476   LearningRate 0.0003   Epoch: 18   Global Step: 234720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:57:36,920-Speed 2939.26 samples/sec   Loss 1.1233   LearningRate 0.0003   Epoch: 18   Global Step: 234730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:57:40,359-Speed 2978.86 samples/sec   Loss 1.1107   LearningRate 0.0003   Epoch: 18   Global Step: 234740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:57:43,733-Speed 3034.80 samples/sec   Loss 1.1260   LearningRate 0.0003   Epoch: 18   Global Step: 234750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:57:47,105-Speed 3038.07 samples/sec   Loss 1.1147   LearningRate 0.0003   Epoch: 18   Global Step: 234760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:57:50,511-Speed 3007.28 samples/sec   Loss 1.1132   LearningRate 0.0003   Epoch: 18   Global Step: 234770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:57:53,981-Speed 2951.52 samples/sec   Loss 1.0977   LearningRate 0.0003   Epoch: 18   Global Step: 234780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:57:57,412-Speed 2985.79 samples/sec   Loss 1.1508   LearningRate 0.0003   Epoch: 18   Global Step: 234790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:58:00,743-Speed 3074.94 samples/sec   Loss 1.0822   LearningRate 0.0003   Epoch: 18   Global Step: 234800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:58:04,094-Speed 3056.78 samples/sec   Loss 1.1302   LearningRate 0.0003   Epoch: 18   Global Step: 234810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:58:07,397-Speed 3101.17 samples/sec   Loss 1.1374   LearningRate 0.0003   Epoch: 18   Global Step: 234820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:58:10,713-Speed 3089.17 samples/sec   Loss 1.1205   LearningRate 0.0003   Epoch: 18   Global Step: 234830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:58:14,073-Speed 3048.84 samples/sec   Loss 1.1065   LearningRate 0.0003   Epoch: 18   Global Step: 234840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:58:17,400-Speed 3078.51 samples/sec   Loss 1.1440   LearningRate 0.0003   Epoch: 18   Global Step: 234850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:58:20,769-Speed 3040.14 samples/sec   Loss 1.1128   LearningRate 0.0003   Epoch: 18   Global Step: 234860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:58:24,205-Speed 2981.61 samples/sec   Loss 1.1579   LearningRate 0.0003   Epoch: 18   Global Step: 234870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:58:27,650-Speed 2973.23 samples/sec   Loss 1.0889   LearningRate 0.0003   Epoch: 18   Global Step: 234880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:58:31,101-Speed 2968.17 samples/sec   Loss 1.0993   LearningRate 0.0003   Epoch: 18   Global Step: 234890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:58:34,521-Speed 2995.16 samples/sec   Loss 1.1224   LearningRate 0.0003   Epoch: 18   Global Step: 234900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:58:37,889-Speed 3041.37 samples/sec   Loss 1.1079   LearningRate 0.0003   Epoch: 18   Global Step: 234910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:58:41,222-Speed 3073.11 samples/sec   Loss 1.0816   LearningRate 0.0003   Epoch: 18   Global Step: 234920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:58:44,635-Speed 3001.91 samples/sec   Loss 1.1331   LearningRate 0.0003   Epoch: 18   Global Step: 234930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:58:48,079-Speed 2973.68 samples/sec   Loss 1.0940   LearningRate 0.0003   Epoch: 18   Global Step: 234940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:58:51,544-Speed 2955.91 samples/sec   Loss 1.1072   LearningRate 0.0003   Epoch: 18   Global Step: 234950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:58:54,966-Speed 2993.25 samples/sec   Loss 1.0986   LearningRate 0.0003   Epoch: 18   Global Step: 234960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:58:58,387-Speed 2994.45 samples/sec   Loss 1.1294   LearningRate 0.0003   Epoch: 18   Global Step: 234970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:59:01,855-Speed 2953.71 samples/sec   Loss 1.0906   LearningRate 0.0003   Epoch: 18   Global Step: 234980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:59:05,200-Speed 3061.98 samples/sec   Loss 1.1092   LearningRate 0.0003   Epoch: 18   Global Step: 234990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:59:08,597-Speed 3015.37 samples/sec   Loss 1.1110   LearningRate 0.0003   Epoch: 18   Global Step: 235000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:59:11,902-Speed 3099.42 samples/sec   Loss 1.1333   LearningRate 0.0003   Epoch: 18   Global Step: 235010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:59:15,284-Speed 3028.32 samples/sec   Loss 1.0810   LearningRate 0.0003   Epoch: 18   Global Step: 235020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:59:18,597-Speed 3091.35 samples/sec   Loss 1.1358   LearningRate 0.0003   Epoch: 18   Global Step: 235030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:59:21,985-Speed 3024.03 samples/sec   Loss 1.1061   LearningRate 0.0003   Epoch: 18   Global Step: 235040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:59:25,328-Speed 3063.48 samples/sec   Loss 1.0970   LearningRate 0.0003   Epoch: 18   Global Step: 235050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:59:28,799-Speed 2951.60 samples/sec   Loss 1.1261   LearningRate 0.0003   Epoch: 18   Global Step: 235060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-27 23:59:32,205-Speed 3007.15 samples/sec   Loss 1.0977   LearningRate 0.0003   Epoch: 18   Global Step: 235070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:59:35,580-Speed 3034.67 samples/sec   Loss 1.0514   LearningRate 0.0003   Epoch: 18   Global Step: 235080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:59:38,977-Speed 3016.57 samples/sec   Loss 1.0976   LearningRate 0.0003   Epoch: 18   Global Step: 235090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:59:42,431-Speed 2965.47 samples/sec   Loss 1.1204   LearningRate 0.0003   Epoch: 18   Global Step: 235100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:59:45,854-Speed 2992.56 samples/sec   Loss 1.1246   LearningRate 0.0003   Epoch: 18   Global Step: 235110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 23:59:49,221-Speed 3042.35 samples/sec   Loss 1.0962   LearningRate 0.0003   Epoch: 18   Global Step: 235120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:59:52,575-Speed 3053.93 samples/sec   Loss 1.1299   LearningRate 0.0003   Epoch: 18   Global Step: 235130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:59:55,941-Speed 3042.48 samples/sec   Loss 1.1384   LearningRate 0.0003   Epoch: 18   Global Step: 235140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 23:59:59,324-Speed 3027.88 samples/sec   Loss 1.0883   LearningRate 0.0003   Epoch: 18   Global Step: 235150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:00:02,734-Speed 3004.50 samples/sec   Loss 1.1317   LearningRate 0.0003   Epoch: 18   Global Step: 235160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:00:06,084-Speed 3057.39 samples/sec   Loss 1.1186   LearningRate 0.0003   Epoch: 18   Global Step: 235170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:00:09,495-Speed 3002.89 samples/sec   Loss 1.1161   LearningRate 0.0003   Epoch: 18   Global Step: 235180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:00:12,859-Speed 3044.75 samples/sec   Loss 1.1086   LearningRate 0.0003   Epoch: 18   Global Step: 235190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:00:16,258-Speed 3013.72 samples/sec   Loss 1.0951   LearningRate 0.0003   Epoch: 18   Global Step: 235200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:00:19,643-Speed 3026.25 samples/sec   Loss 1.0952   LearningRate 0.0003   Epoch: 18   Global Step: 235210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:00:23,018-Speed 3034.77 samples/sec   Loss 1.1446   LearningRate 0.0003   Epoch: 18   Global Step: 235220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:00:26,433-Speed 2999.70 samples/sec   Loss 1.1443   LearningRate 0.0003   Epoch: 18   Global Step: 235230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:00:29,788-Speed 3052.56 samples/sec   Loss 1.0969   LearningRate 0.0003   Epoch: 18   Global Step: 235240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:00:33,248-Speed 2960.79 samples/sec   Loss 1.0888   LearningRate 0.0003   Epoch: 18   Global Step: 235250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:00:36,630-Speed 3028.75 samples/sec   Loss 1.1020   LearningRate 0.0003   Epoch: 18   Global Step: 235260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:00:39,958-Speed 3077.97 samples/sec   Loss 1.1376   LearningRate 0.0003   Epoch: 18   Global Step: 235270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:00:43,333-Speed 3035.57 samples/sec   Loss 1.0731   LearningRate 0.0003   Epoch: 18   Global Step: 235280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:00:46,817-Speed 2939.59 samples/sec   Loss 1.0542   LearningRate 0.0003   Epoch: 18   Global Step: 235290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:00:50,323-Speed 2921.44 samples/sec   Loss 1.1306   LearningRate 0.0003   Epoch: 18   Global Step: 235300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:00:53,669-Speed 3061.19 samples/sec   Loss 1.1736   LearningRate 0.0003   Epoch: 18   Global Step: 235310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:00:57,015-Speed 3061.39 samples/sec   Loss 1.1198   LearningRate 0.0003   Epoch: 18   Global Step: 235320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:01:00,461-Speed 2972.32 samples/sec   Loss 1.1270   LearningRate 0.0003   Epoch: 18   Global Step: 235330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:01:03,954-Speed 2932.43 samples/sec   Loss 1.1100   LearningRate 0.0003   Epoch: 18   Global Step: 235340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:01:07,386-Speed 2984.99 samples/sec   Loss 1.1239   LearningRate 0.0003   Epoch: 18   Global Step: 235350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:01:10,836-Speed 2969.46 samples/sec   Loss 1.0854   LearningRate 0.0003   Epoch: 18   Global Step: 235360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:01:14,281-Speed 2973.08 samples/sec   Loss 1.0477   LearningRate 0.0003   Epoch: 18   Global Step: 235370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:01:17,703-Speed 2992.92 samples/sec   Loss 1.1459   LearningRate 0.0003   Epoch: 18   Global Step: 235380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:01:21,121-Speed 2996.35 samples/sec   Loss 1.1117   LearningRate 0.0003   Epoch: 18   Global Step: 235390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:01:24,504-Speed 3028.11 samples/sec   Loss 1.1571   LearningRate 0.0003   Epoch: 18   Global Step: 235400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:01:27,975-Speed 2951.21 samples/sec   Loss 1.1450   LearningRate 0.0003   Epoch: 18   Global Step: 235410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:01:31,311-Speed 3069.73 samples/sec   Loss 1.0437   LearningRate 0.0003   Epoch: 18   Global Step: 235420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:01:34,652-Speed 3066.19 samples/sec   Loss 1.1376   LearningRate 0.0003   Epoch: 18   Global Step: 235430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:01:38,041-Speed 3022.82 samples/sec   Loss 1.1521   LearningRate 0.0003   Epoch: 18   Global Step: 235440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:01:41,450-Speed 3004.73 samples/sec   Loss 1.1437   LearningRate 0.0003   Epoch: 18   Global Step: 235450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:01:44,843-Speed 3018.80 samples/sec   Loss 1.1170   LearningRate 0.0003   Epoch: 18   Global Step: 235460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:01:48,225-Speed 3028.65 samples/sec   Loss 1.1253   LearningRate 0.0003   Epoch: 18   Global Step: 235470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:01:51,644-Speed 2995.23 samples/sec   Loss 1.0995   LearningRate 0.0003   Epoch: 18   Global Step: 235480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:01:55,038-Speed 3018.23 samples/sec   Loss 1.0884   LearningRate 0.0003   Epoch: 18   Global Step: 235490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:01:58,487-Speed 2970.11 samples/sec   Loss 1.1418   LearningRate 0.0003   Epoch: 18   Global Step: 235500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:02:01,889-Speed 3010.81 samples/sec   Loss 1.1729   LearningRate 0.0003   Epoch: 18   Global Step: 235510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:02:05,248-Speed 3049.24 samples/sec   Loss 1.0906   LearningRate 0.0003   Epoch: 18   Global Step: 235520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:02:08,609-Speed 3047.34 samples/sec   Loss 1.0926   LearningRate 0.0003   Epoch: 18   Global Step: 235530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:02:12,013-Speed 3009.25 samples/sec   Loss 1.1226   LearningRate 0.0003   Epoch: 18   Global Step: 235540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:02:15,398-Speed 3026.18 samples/sec   Loss 1.1211   LearningRate 0.0003   Epoch: 18   Global Step: 235550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:02:18,756-Speed 3050.25 samples/sec   Loss 1.0994   LearningRate 0.0003   Epoch: 18   Global Step: 235560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:02:22,190-Speed 2983.01 samples/sec   Loss 1.1381   LearningRate 0.0003   Epoch: 18   Global Step: 235570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:02:25,623-Speed 2983.56 samples/sec   Loss 1.1398   LearningRate 0.0003   Epoch: 18   Global Step: 235580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:02:29,082-Speed 2961.13 samples/sec   Loss 1.1074   LearningRate 0.0003   Epoch: 18   Global Step: 235590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:02:32,532-Speed 2968.75 samples/sec   Loss 1.1256   LearningRate 0.0003   Epoch: 18   Global Step: 235600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:02:35,946-Speed 3000.07 samples/sec   Loss 1.0925   LearningRate 0.0003   Epoch: 18   Global Step: 235610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:02:39,281-Speed 3071.32 samples/sec   Loss 1.1012   LearningRate 0.0003   Epoch: 18   Global Step: 235620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:02:42,653-Speed 3037.90 samples/sec   Loss 1.1416   LearningRate 0.0003   Epoch: 18   Global Step: 235630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:02:46,000-Speed 3060.86 samples/sec   Loss 1.0977   LearningRate 0.0003   Epoch: 18   Global Step: 235640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:02:49,368-Speed 3041.28 samples/sec   Loss 1.1062   LearningRate 0.0003   Epoch: 18   Global Step: 235650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:02:52,775-Speed 3006.29 samples/sec   Loss 1.0933   LearningRate 0.0003   Epoch: 18   Global Step: 235660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:02:56,107-Speed 3073.60 samples/sec   Loss 1.1359   LearningRate 0.0003   Epoch: 18   Global Step: 235670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:02:59,452-Speed 3062.38 samples/sec   Loss 1.1036   LearningRate 0.0003   Epoch: 18   Global Step: 235680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:03:02,824-Speed 3037.69 samples/sec   Loss 1.1590   LearningRate 0.0003   Epoch: 18   Global Step: 235690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:03:06,194-Speed 3039.08 samples/sec   Loss 1.0957   LearningRate 0.0003   Epoch: 18   Global Step: 235700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:03:10,229-Speed 2538.96 samples/sec   Loss 1.1000   LearningRate 0.0003   Epoch: 18   Global Step: 235710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:03:13,633-Speed 3008.73 samples/sec   Loss 1.1040   LearningRate 0.0003   Epoch: 18   Global Step: 235720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:03:17,712-Speed 2510.93 samples/sec   Loss 1.1089   LearningRate 0.0003   Epoch: 18   Global Step: 235730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:03:21,638-Speed 2608.81 samples/sec   Loss 1.1048   LearningRate 0.0003   Epoch: 18   Global Step: 235740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:03:24,996-Speed 3050.56 samples/sec   Loss 1.1115   LearningRate 0.0003   Epoch: 18   Global Step: 235750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:03:28,963-Speed 2581.92 samples/sec   Loss 1.0911   LearningRate 0.0003   Epoch: 18   Global Step: 235760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:03:32,319-Speed 3052.41 samples/sec   Loss 1.0565   LearningRate 0.0003   Epoch: 18   Global Step: 235770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:03:35,734-Speed 2998.66 samples/sec   Loss 1.0886   LearningRate 0.0003   Epoch: 18   Global Step: 235780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:03:39,144-Speed 3004.76 samples/sec   Loss 1.0919   LearningRate 0.0003   Epoch: 18   Global Step: 235790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:03:42,590-Speed 2971.68 samples/sec   Loss 1.1324   LearningRate 0.0003   Epoch: 18   Global Step: 235800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:03:45,999-Speed 3005.40 samples/sec   Loss 1.1350   LearningRate 0.0003   Epoch: 18   Global Step: 235810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:03:49,440-Speed 2976.54 samples/sec   Loss 1.0993   LearningRate 0.0003   Epoch: 18   Global Step: 235820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:03:52,779-Speed 3067.16 samples/sec   Loss 1.1305   LearningRate 0.0003   Epoch: 18   Global Step: 235830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:03:56,162-Speed 3028.50 samples/sec   Loss 1.0910   LearningRate 0.0003   Epoch: 18   Global Step: 235840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:03:59,530-Speed 3041.42 samples/sec   Loss 1.0992   LearningRate 0.0003   Epoch: 18   Global Step: 235850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:04:02,926-Speed 3016.15 samples/sec   Loss 1.1372   LearningRate 0.0003   Epoch: 18   Global Step: 235860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:04:06,356-Speed 2985.68 samples/sec   Loss 1.1053   LearningRate 0.0003   Epoch: 18   Global Step: 235870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:04:09,774-Speed 2996.92 samples/sec   Loss 1.1171   LearningRate 0.0003   Epoch: 18   Global Step: 235880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:04:13,171-Speed 3015.55 samples/sec   Loss 1.1555   LearningRate 0.0003   Epoch: 18   Global Step: 235890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:04:16,575-Speed 3010.13 samples/sec   Loss 1.1206   LearningRate 0.0003   Epoch: 18   Global Step: 235900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:04:20,019-Speed 2974.70 samples/sec   Loss 1.0833   LearningRate 0.0003   Epoch: 18   Global Step: 235910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:04:23,417-Speed 3013.96 samples/sec   Loss 1.1168   LearningRate 0.0003   Epoch: 18   Global Step: 235920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:04:26,795-Speed 3031.98 samples/sec   Loss 1.1404   LearningRate 0.0003   Epoch: 18   Global Step: 235930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:04:30,164-Speed 3040.68 samples/sec   Loss 1.0984   LearningRate 0.0003   Epoch: 18   Global Step: 235940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:04:33,619-Speed 2964.67 samples/sec   Loss 1.1256   LearningRate 0.0003   Epoch: 18   Global Step: 235950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:04:36,977-Speed 3049.77 samples/sec   Loss 1.1095   LearningRate 0.0003   Epoch: 18   Global Step: 235960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:04:40,432-Speed 2965.00 samples/sec   Loss 1.0843   LearningRate 0.0003   Epoch: 18   Global Step: 235970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:04:43,858-Speed 2989.26 samples/sec   Loss 1.1428   LearningRate 0.0003   Epoch: 18   Global Step: 235980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:04:47,585-Speed 2748.52 samples/sec   Loss 1.1219   LearningRate 0.0003   Epoch: 18   Global Step: 235990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:05:20,223-Speed 313.76 samples/sec   Loss 1.0956   LearningRate 0.0002   Epoch: 19   Global Step: 236000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:05:23,845-Speed 2828.39 samples/sec   Loss 0.8540   LearningRate 0.0002   Epoch: 19   Global Step: 236010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:05:27,468-Speed 2827.50 samples/sec   Loss 0.8530   LearningRate 0.0002   Epoch: 19   Global Step: 236020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:05:31,130-Speed 2799.18 samples/sec   Loss 0.8778   LearningRate 0.0002   Epoch: 19   Global Step: 236030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:05:34,504-Speed 3035.41 samples/sec   Loss 0.8590   LearningRate 0.0002   Epoch: 19   Global Step: 236040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:05:38,039-Speed 2897.89 samples/sec   Loss 0.8080   LearningRate 0.0002   Epoch: 19   Global Step: 236050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:05:41,644-Speed 2841.75 samples/sec   Loss 0.8558   LearningRate 0.0002   Epoch: 19   Global Step: 236060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:05:45,200-Speed 2880.73 samples/sec   Loss 0.8641   LearningRate 0.0002   Epoch: 19   Global Step: 236070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:05:48,754-Speed 2881.98 samples/sec   Loss 0.8799   LearningRate 0.0002   Epoch: 19   Global Step: 236080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:05:52,172-Speed 2996.66 samples/sec   Loss 0.8695   LearningRate 0.0002   Epoch: 19   Global Step: 236090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:05:55,633-Speed 2959.30 samples/sec   Loss 0.8866   LearningRate 0.0002   Epoch: 19   Global Step: 236100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:05:59,514-Speed 2639.39 samples/sec   Loss 0.8568   LearningRate 0.0002   Epoch: 19   Global Step: 236110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:02,852-Speed 3068.66 samples/sec   Loss 0.8360   LearningRate 0.0002   Epoch: 19   Global Step: 236120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:06,253-Speed 3011.71 samples/sec   Loss 0.8445   LearningRate 0.0002   Epoch: 19   Global Step: 236130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:10,268-Speed 2551.00 samples/sec   Loss 0.8518   LearningRate 0.0002   Epoch: 19   Global Step: 236140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:13,679-Speed 3003.42 samples/sec   Loss 0.8752   LearningRate 0.0002   Epoch: 19   Global Step: 236150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:17,002-Speed 3082.55 samples/sec   Loss 0.8762   LearningRate 0.0002   Epoch: 19   Global Step: 236160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:20,396-Speed 3017.36 samples/sec   Loss 0.8463   LearningRate 0.0002   Epoch: 19   Global Step: 236170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:23,800-Speed 3009.43 samples/sec   Loss 0.8655   LearningRate 0.0002   Epoch: 19   Global Step: 236180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:27,273-Speed 2949.34 samples/sec   Loss 0.8670   LearningRate 0.0002   Epoch: 19   Global Step: 236190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:30,743-Speed 2952.71 samples/sec   Loss 0.8638   LearningRate 0.0002   Epoch: 19   Global Step: 236200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:34,206-Speed 2957.16 samples/sec   Loss 0.8420   LearningRate 0.0002   Epoch: 19   Global Step: 236210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:37,613-Speed 3006.51 samples/sec   Loss 0.8969   LearningRate 0.0002   Epoch: 19   Global Step: 236220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:40,994-Speed 3029.57 samples/sec   Loss 0.8794   LearningRate 0.0002   Epoch: 19   Global Step: 236230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:44,354-Speed 3048.38 samples/sec   Loss 0.8649   LearningRate 0.0002   Epoch: 19   Global Step: 236240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:06:47,664-Speed 3094.91 samples/sec   Loss 0.8964   LearningRate 0.0002   Epoch: 19   Global Step: 236250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:51,032-Speed 3040.87 samples/sec   Loss 0.8415   LearningRate 0.0002   Epoch: 19   Global Step: 236260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:54,427-Speed 3016.78 samples/sec   Loss 0.8747   LearningRate 0.0002   Epoch: 19   Global Step: 236270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:06:57,861-Speed 2983.21 samples/sec   Loss 0.8484   LearningRate 0.0002   Epoch: 19   Global Step: 236280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:07:01,373-Speed 2916.39 samples/sec   Loss 0.8713   LearningRate 0.0002   Epoch: 19   Global Step: 236290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:07:04,870-Speed 2929.77 samples/sec   Loss 0.8614   LearningRate 0.0002   Epoch: 19   Global Step: 236300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:07:08,179-Speed 3094.61 samples/sec   Loss 0.8814   LearningRate 0.0002   Epoch: 19   Global Step: 236310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:07:12,487-Speed 2377.99 samples/sec   Loss 0.8211   LearningRate 0.0002   Epoch: 19   Global Step: 236320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:07:15,868-Speed 3029.34 samples/sec   Loss 0.9200   LearningRate 0.0002   Epoch: 19   Global Step: 236330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:07:19,235-Speed 3042.04 samples/sec   Loss 0.8593   LearningRate 0.0002   Epoch: 19   Global Step: 236340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:07:22,652-Speed 2997.35 samples/sec   Loss 0.8721   LearningRate 0.0002   Epoch: 19   Global Step: 236350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:07:25,998-Speed 3061.55 samples/sec   Loss 0.8791   LearningRate 0.0002   Epoch: 19   Global Step: 236360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:07:29,414-Speed 2998.15 samples/sec   Loss 0.8739   LearningRate 0.0002   Epoch: 19   Global Step: 236370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:07:32,799-Speed 3026.53 samples/sec   Loss 0.8477   LearningRate 0.0002   Epoch: 19   Global Step: 236380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:07:36,139-Speed 3066.16 samples/sec   Loss 0.8205   LearningRate 0.0002   Epoch: 19   Global Step: 236390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:07:39,555-Speed 2999.07 samples/sec   Loss 0.8814   LearningRate 0.0002   Epoch: 19   Global Step: 236400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:07:42,884-Speed 3076.80 samples/sec   Loss 0.8817   LearningRate 0.0002   Epoch: 19   Global Step: 236410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:07:46,322-Speed 2979.25 samples/sec   Loss 0.8531   LearningRate 0.0002   Epoch: 19   Global Step: 236420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:07:49,819-Speed 2929.05 samples/sec   Loss 0.8417   LearningRate 0.0002   Epoch: 19   Global Step: 236430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:07:53,236-Speed 2997.84 samples/sec   Loss 0.8825   LearningRate 0.0002   Epoch: 19   Global Step: 236440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:07:56,578-Speed 3064.90 samples/sec   Loss 0.9153   LearningRate 0.0002   Epoch: 19   Global Step: 236450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:07:59,992-Speed 3000.27 samples/sec   Loss 0.8550   LearningRate 0.0002   Epoch: 19   Global Step: 236460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:08:03,399-Speed 3006.60 samples/sec   Loss 0.8458   LearningRate 0.0002   Epoch: 19   Global Step: 236470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:08:06,874-Speed 2949.06 samples/sec   Loss 0.8552   LearningRate 0.0002   Epoch: 19   Global Step: 236480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:08:10,319-Speed 2973.27 samples/sec   Loss 0.8800   LearningRate 0.0002   Epoch: 19   Global Step: 236490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:08:13,661-Speed 3064.98 samples/sec   Loss 0.8364   LearningRate 0.0002   Epoch: 19   Global Step: 236500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:08:17,013-Speed 3055.69 samples/sec   Loss 0.8510   LearningRate 0.0002   Epoch: 19   Global Step: 236510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:08:20,384-Speed 3038.41 samples/sec   Loss 0.8491   LearningRate 0.0002   Epoch: 19   Global Step: 236520   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:08:23,744-Speed 3049.16 samples/sec   Loss 0.8186   LearningRate 0.0002   Epoch: 19   Global Step: 236530   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:08:27,107-Speed 3045.10 samples/sec   Loss 0.8881   LearningRate 0.0002   Epoch: 19   Global Step: 236540   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:08:30,503-Speed 3016.23 samples/sec   Loss 0.8079   LearningRate 0.0002   Epoch: 19   Global Step: 236550   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:08:33,874-Speed 3038.37 samples/sec   Loss 0.8872   LearningRate 0.0002   Epoch: 19   Global Step: 236560   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:08:37,283-Speed 3004.57 samples/sec   Loss 0.8598   LearningRate 0.0002   Epoch: 19   Global Step: 236570   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:08:40,687-Speed 3008.85 samples/sec   Loss 0.9039   LearningRate 0.0002   Epoch: 19   Global Step: 236580   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:08:44,128-Speed 2976.88 samples/sec   Loss 0.8300   LearningRate 0.0002   Epoch: 19   Global Step: 236590   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:08:47,478-Speed 3057.12 samples/sec   Loss 0.8701   LearningRate 0.0002   Epoch: 19   Global Step: 236600   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:08:50,839-Speed 3048.03 samples/sec   Loss 0.8648   LearningRate 0.0002   Epoch: 19   Global Step: 236610   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:08:54,196-Speed 3051.44 samples/sec   Loss 0.8475   LearningRate 0.0002   Epoch: 19   Global Step: 236620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:08:57,566-Speed 3038.86 samples/sec   Loss 0.8603   LearningRate 0.0002   Epoch: 19   Global Step: 236630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:09:00,927-Speed 3047.92 samples/sec   Loss 0.8517   LearningRate 0.0002   Epoch: 19   Global Step: 236640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:09:04,296-Speed 3040.29 samples/sec   Loss 0.9001   LearningRate 0.0002   Epoch: 19   Global Step: 236650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:09:07,688-Speed 3023.11 samples/sec   Loss 0.8619   LearningRate 0.0002   Epoch: 19   Global Step: 236660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:09:11,030-Speed 3065.20 samples/sec   Loss 0.8790   LearningRate 0.0002   Epoch: 19   Global Step: 236670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:09:14,388-Speed 3049.83 samples/sec   Loss 0.8728   LearningRate 0.0002   Epoch: 19   Global Step: 236680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:09:17,846-Speed 2962.35 samples/sec   Loss 0.8787   LearningRate 0.0002   Epoch: 19   Global Step: 236690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:09:21,223-Speed 3032.96 samples/sec   Loss 0.8365   LearningRate 0.0002   Epoch: 19   Global Step: 236700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:09:24,545-Speed 3083.22 samples/sec   Loss 0.8691   LearningRate 0.0002   Epoch: 19   Global Step: 236710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:09:27,921-Speed 3034.41 samples/sec   Loss 0.8532   LearningRate 0.0002   Epoch: 19   Global Step: 236720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:09:31,311-Speed 3021.78 samples/sec   Loss 0.8583   LearningRate 0.0002   Epoch: 19   Global Step: 236730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:09:34,653-Speed 3064.84 samples/sec   Loss 0.8512   LearningRate 0.0002   Epoch: 19   Global Step: 236740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:09:38,110-Speed 2962.43 samples/sec   Loss 0.8986   LearningRate 0.0002   Epoch: 19   Global Step: 236750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:09:41,460-Speed 3057.77 samples/sec   Loss 0.8442   LearningRate 0.0002   Epoch: 19   Global Step: 236760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:09:44,834-Speed 3035.90 samples/sec   Loss 0.8686   LearningRate 0.0002   Epoch: 19   Global Step: 236770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:09:48,156-Speed 3083.58 samples/sec   Loss 0.8388   LearningRate 0.0002   Epoch: 19   Global Step: 236780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:09:51,510-Speed 3054.05 samples/sec   Loss 0.9069   LearningRate 0.0002   Epoch: 19   Global Step: 236790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:09:54,833-Speed 3081.89 samples/sec   Loss 0.8837   LearningRate 0.0002   Epoch: 19   Global Step: 236800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:09:58,213-Speed 3030.59 samples/sec   Loss 0.8791   LearningRate 0.0002   Epoch: 19   Global Step: 236810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:10:01,602-Speed 3022.42 samples/sec   Loss 0.8458   LearningRate 0.0002   Epoch: 19   Global Step: 236820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:10:04,958-Speed 3052.12 samples/sec   Loss 0.8738   LearningRate 0.0002   Epoch: 19   Global Step: 236830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:10:08,339-Speed 3029.90 samples/sec   Loss 0.8901   LearningRate 0.0002   Epoch: 19   Global Step: 236840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:10:11,733-Speed 3017.67 samples/sec   Loss 0.8489   LearningRate 0.0002   Epoch: 19   Global Step: 236850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:10:15,083-Speed 3057.92 samples/sec   Loss 0.8496   LearningRate 0.0002   Epoch: 19   Global Step: 236860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:10:18,464-Speed 3029.66 samples/sec   Loss 0.8542   LearningRate 0.0002   Epoch: 19   Global Step: 236870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:10:21,827-Speed 3045.63 samples/sec   Loss 0.8482   LearningRate 0.0002   Epoch: 19   Global Step: 236880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:10:25,164-Speed 3069.61 samples/sec   Loss 0.8434   LearningRate 0.0002   Epoch: 19   Global Step: 236890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:10:28,537-Speed 3037.04 samples/sec   Loss 0.8471   LearningRate 0.0002   Epoch: 19   Global Step: 236900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:10:31,938-Speed 3011.01 samples/sec   Loss 0.9328   LearningRate 0.0002   Epoch: 19   Global Step: 236910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:10:35,338-Speed 3012.92 samples/sec   Loss 0.8993   LearningRate 0.0002   Epoch: 19   Global Step: 236920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:10:38,688-Speed 3058.52 samples/sec   Loss 0.8577   LearningRate 0.0002   Epoch: 19   Global Step: 236930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:10:42,126-Speed 2978.61 samples/sec   Loss 0.8904   LearningRate 0.0002   Epoch: 19   Global Step: 236940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:10:45,525-Speed 3014.20 samples/sec   Loss 0.8512   LearningRate 0.0002   Epoch: 19   Global Step: 236950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:10:48,929-Speed 3008.93 samples/sec   Loss 0.8740   LearningRate 0.0002   Epoch: 19   Global Step: 236960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:10:52,283-Speed 3053.17 samples/sec   Loss 0.8648   LearningRate 0.0002   Epoch: 19   Global Step: 236970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:10:55,574-Speed 3112.44 samples/sec   Loss 0.8695   LearningRate 0.0002   Epoch: 19   Global Step: 236980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:10:58,911-Speed 3070.24 samples/sec   Loss 0.8705   LearningRate 0.0002   Epoch: 19   Global Step: 236990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:11:02,270-Speed 3048.66 samples/sec   Loss 0.9119   LearningRate 0.0002   Epoch: 19   Global Step: 237000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:11:05,666-Speed 3016.72 samples/sec   Loss 0.8763   LearningRate 0.0002   Epoch: 19   Global Step: 237010   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:11:09,152-Speed 2938.40 samples/sec   Loss 0.8597   LearningRate 0.0002   Epoch: 19   Global Step: 237020   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:11:12,630-Speed 2944.13 samples/sec   Loss 0.8470   LearningRate 0.0002   Epoch: 19   Global Step: 237030   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:11:15,982-Speed 3056.70 samples/sec   Loss 0.8755   LearningRate 0.0002   Epoch: 19   Global Step: 237040   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:11:19,310-Speed 3077.03 samples/sec   Loss 0.8939   LearningRate 0.0002   Epoch: 19   Global Step: 237050   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:11:22,690-Speed 3030.41 samples/sec   Loss 0.8395   LearningRate 0.0002   Epoch: 19   Global Step: 237060   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:11:26,044-Speed 3053.92 samples/sec   Loss 0.8688   LearningRate 0.0002   Epoch: 19   Global Step: 237070   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:11:29,443-Speed 3014.13 samples/sec   Loss 0.8519   LearningRate 0.0002   Epoch: 19   Global Step: 237080   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:11:32,801-Speed 3049.63 samples/sec   Loss 0.8218   LearningRate 0.0002   Epoch: 19   Global Step: 237090   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:11:36,171-Speed 3039.28 samples/sec   Loss 0.8891   LearningRate 0.0002   Epoch: 19   Global Step: 237100   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-28 00:11:39,545-Speed 3036.53 samples/sec   Loss 0.8584   LearningRate 0.0002   Epoch: 19   Global Step: 237110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:11:42,958-Speed 3000.95 samples/sec   Loss 0.8264   LearningRate 0.0002   Epoch: 19   Global Step: 237120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:11:46,341-Speed 3027.46 samples/sec   Loss 0.8967   LearningRate 0.0002   Epoch: 19   Global Step: 237130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:11:49,731-Speed 3021.93 samples/sec   Loss 0.8902   LearningRate 0.0002   Epoch: 19   Global Step: 237140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:11:53,084-Speed 3054.87 samples/sec   Loss 0.8688   LearningRate 0.0002   Epoch: 19   Global Step: 237150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:11:56,406-Speed 3082.61 samples/sec   Loss 0.8421   LearningRate 0.0002   Epoch: 19   Global Step: 237160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:11:59,720-Speed 3091.44 samples/sec   Loss 0.8810   LearningRate 0.0002   Epoch: 19   Global Step: 237170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:12:03,102-Speed 3028.45 samples/sec   Loss 0.9124   LearningRate 0.0002   Epoch: 19   Global Step: 237180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:12:06,450-Speed 3059.36 samples/sec   Loss 0.8889   LearningRate 0.0002   Epoch: 19   Global Step: 237190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:12:09,792-Speed 3065.82 samples/sec   Loss 0.8761   LearningRate 0.0002   Epoch: 19   Global Step: 237200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:12:13,168-Speed 3034.16 samples/sec   Loss 0.8694   LearningRate 0.0002   Epoch: 19   Global Step: 237210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:12:16,511-Speed 3064.22 samples/sec   Loss 0.8463   LearningRate 0.0002   Epoch: 19   Global Step: 237220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:12:19,864-Speed 3053.81 samples/sec   Loss 0.8869   LearningRate 0.0002   Epoch: 19   Global Step: 237230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:12:23,314-Speed 2969.06 samples/sec   Loss 0.8749   LearningRate 0.0002   Epoch: 19   Global Step: 237240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:12:26,669-Speed 3053.75 samples/sec   Loss 0.8908   LearningRate 0.0002   Epoch: 19   Global Step: 237250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:12:29,999-Speed 3076.19 samples/sec   Loss 0.8523   LearningRate 0.0002   Epoch: 19   Global Step: 237260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:12:33,560-Speed 2875.97 samples/sec   Loss 0.8595   LearningRate 0.0002   Epoch: 19   Global Step: 237270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:12:36,930-Speed 3039.72 samples/sec   Loss 0.8424   LearningRate 0.0002   Epoch: 19   Global Step: 237280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:12:40,311-Speed 3029.73 samples/sec   Loss 0.8488   LearningRate 0.0002   Epoch: 19   Global Step: 237290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:12:43,652-Speed 3066.30 samples/sec   Loss 0.8699   LearningRate 0.0002   Epoch: 19   Global Step: 237300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:12:47,017-Speed 3043.26 samples/sec   Loss 0.8451   LearningRate 0.0002   Epoch: 19   Global Step: 237310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:12:50,426-Speed 3005.20 samples/sec   Loss 0.8539   LearningRate 0.0002   Epoch: 19   Global Step: 237320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:12:53,819-Speed 3018.32 samples/sec   Loss 0.8634   LearningRate 0.0002   Epoch: 19   Global Step: 237330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:12:57,151-Speed 3074.01 samples/sec   Loss 0.8574   LearningRate 0.0002   Epoch: 19   Global Step: 237340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:00,575-Speed 2991.77 samples/sec   Loss 0.8822   LearningRate 0.0002   Epoch: 19   Global Step: 237350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:03,884-Speed 3095.59 samples/sec   Loss 0.8313   LearningRate 0.0002   Epoch: 19   Global Step: 237360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:07,213-Speed 3076.54 samples/sec   Loss 0.9024   LearningRate 0.0002   Epoch: 19   Global Step: 237370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:10,541-Speed 3078.50 samples/sec   Loss 0.8511   LearningRate 0.0002   Epoch: 19   Global Step: 237380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:13,851-Speed 3094.27 samples/sec   Loss 0.8560   LearningRate 0.0002   Epoch: 19   Global Step: 237390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:17,246-Speed 3017.65 samples/sec   Loss 0.8267   LearningRate 0.0002   Epoch: 19   Global Step: 237400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:20,631-Speed 3025.87 samples/sec   Loss 0.9028   LearningRate 0.0002   Epoch: 19   Global Step: 237410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:24,009-Speed 3031.63 samples/sec   Loss 0.8707   LearningRate 0.0002   Epoch: 19   Global Step: 237420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:13:27,496-Speed 2937.57 samples/sec   Loss 0.8471   LearningRate 0.0002   Epoch: 19   Global Step: 237430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:13:30,884-Speed 3023.66 samples/sec   Loss 0.8870   LearningRate 0.0002   Epoch: 19   Global Step: 237440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:34,240-Speed 3051.57 samples/sec   Loss 0.8688   LearningRate 0.0002   Epoch: 19   Global Step: 237450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:37,576-Speed 3070.08 samples/sec   Loss 0.8283   LearningRate 0.0002   Epoch: 19   Global Step: 237460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:40,965-Speed 3023.27 samples/sec   Loss 0.8765   LearningRate 0.0002   Epoch: 19   Global Step: 237470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:44,403-Speed 2978.95 samples/sec   Loss 0.8621   LearningRate 0.0002   Epoch: 19   Global Step: 237480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:47,832-Speed 2987.62 samples/sec   Loss 0.8701   LearningRate 0.0002   Epoch: 19   Global Step: 237490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:51,294-Speed 2958.91 samples/sec   Loss 0.8722   LearningRate 0.0002   Epoch: 19   Global Step: 237500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:54,770-Speed 2946.57 samples/sec   Loss 0.8272   LearningRate 0.0002   Epoch: 19   Global Step: 237510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:13:58,155-Speed 3025.65 samples/sec   Loss 0.8397   LearningRate 0.0002   Epoch: 19   Global Step: 237520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:01,548-Speed 3021.49 samples/sec   Loss 0.8414   LearningRate 0.0002   Epoch: 19   Global Step: 237530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:04,862-Speed 3089.76 samples/sec   Loss 0.8836   LearningRate 0.0002   Epoch: 19   Global Step: 237540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:08,204-Speed 3065.15 samples/sec   Loss 0.8957   LearningRate 0.0002   Epoch: 19   Global Step: 237550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:11,543-Speed 3068.18 samples/sec   Loss 0.8518   LearningRate 0.0002   Epoch: 19   Global Step: 237560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:14,868-Speed 3080.18 samples/sec   Loss 0.8703   LearningRate 0.0002   Epoch: 19   Global Step: 237570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:18,204-Speed 3070.37 samples/sec   Loss 0.8631   LearningRate 0.0002   Epoch: 19   Global Step: 237580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:21,610-Speed 3008.06 samples/sec   Loss 0.8546   LearningRate 0.0002   Epoch: 19   Global Step: 237590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:25,089-Speed 2943.87 samples/sec   Loss 0.8403   LearningRate 0.0002   Epoch: 19   Global Step: 237600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:28,500-Speed 3002.50 samples/sec   Loss 0.8603   LearningRate 0.0002   Epoch: 19   Global Step: 237610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:31,848-Speed 3059.29 samples/sec   Loss 0.8800   LearningRate 0.0002   Epoch: 19   Global Step: 237620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:35,293-Speed 2973.51 samples/sec   Loss 0.8767   LearningRate 0.0002   Epoch: 19   Global Step: 237630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:38,746-Speed 2966.28 samples/sec   Loss 0.8998   LearningRate 0.0002   Epoch: 19   Global Step: 237640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:42,104-Speed 3050.21 samples/sec   Loss 0.8771   LearningRate 0.0002   Epoch: 19   Global Step: 237650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:45,503-Speed 3013.77 samples/sec   Loss 0.8964   LearningRate 0.0002   Epoch: 19   Global Step: 237660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:48,835-Speed 3074.18 samples/sec   Loss 0.8826   LearningRate 0.0002   Epoch: 19   Global Step: 237670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:52,229-Speed 3018.12 samples/sec   Loss 0.8751   LearningRate 0.0002   Epoch: 19   Global Step: 237680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:55,548-Speed 3086.21 samples/sec   Loss 0.8488   LearningRate 0.0002   Epoch: 19   Global Step: 237690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:14:58,931-Speed 3028.23 samples/sec   Loss 0.8754   LearningRate 0.0002   Epoch: 19   Global Step: 237700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:02,408-Speed 2945.92 samples/sec   Loss 0.8877   LearningRate 0.0002   Epoch: 19   Global Step: 237710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:05,773-Speed 3043.18 samples/sec   Loss 0.8663   LearningRate 0.0002   Epoch: 19   Global Step: 237720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:09,194-Speed 2994.93 samples/sec   Loss 0.8586   LearningRate 0.0002   Epoch: 19   Global Step: 237730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:12,537-Speed 3064.89 samples/sec   Loss 0.8864   LearningRate 0.0002   Epoch: 19   Global Step: 237740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:15:15,884-Speed 3060.23 samples/sec   Loss 0.8867   LearningRate 0.0002   Epoch: 19   Global Step: 237750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:19,260-Speed 3033.73 samples/sec   Loss 0.8686   LearningRate 0.0002   Epoch: 19   Global Step: 237760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:22,676-Speed 2998.46 samples/sec   Loss 0.8705   LearningRate 0.0002   Epoch: 19   Global Step: 237770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:26,129-Speed 2966.24 samples/sec   Loss 0.8316   LearningRate 0.0002   Epoch: 19   Global Step: 237780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:29,454-Speed 3080.92 samples/sec   Loss 0.8561   LearningRate 0.0002   Epoch: 19   Global Step: 237790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:32,863-Speed 3004.52 samples/sec   Loss 0.8556   LearningRate 0.0002   Epoch: 19   Global Step: 237800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:36,322-Speed 2961.29 samples/sec   Loss 0.9000   LearningRate 0.0002   Epoch: 19   Global Step: 237810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:39,792-Speed 2951.85 samples/sec   Loss 0.8660   LearningRate 0.0002   Epoch: 19   Global Step: 237820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:43,182-Speed 3021.37 samples/sec   Loss 0.8267   LearningRate 0.0002   Epoch: 19   Global Step: 237830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:46,565-Speed 3028.19 samples/sec   Loss 0.8658   LearningRate 0.0002   Epoch: 19   Global Step: 237840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:49,941-Speed 3033.99 samples/sec   Loss 0.8531   LearningRate 0.0002   Epoch: 19   Global Step: 237850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:15:53,274-Speed 3072.76 samples/sec   Loss 0.8290   LearningRate 0.0002   Epoch: 19   Global Step: 237860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:15:56,657-Speed 3027.65 samples/sec   Loss 0.8572   LearningRate 0.0002   Epoch: 19   Global Step: 237870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:00,113-Speed 2964.09 samples/sec   Loss 0.8605   LearningRate 0.0002   Epoch: 19   Global Step: 237880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:03,478-Speed 3043.41 samples/sec   Loss 0.8915   LearningRate 0.0002   Epoch: 19   Global Step: 237890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:06,862-Speed 3027.21 samples/sec   Loss 0.8808   LearningRate 0.0002   Epoch: 19   Global Step: 237900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:10,247-Speed 3026.47 samples/sec   Loss 0.8406   LearningRate 0.0002   Epoch: 19   Global Step: 237910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:13,701-Speed 2964.77 samples/sec   Loss 0.9064   LearningRate 0.0002   Epoch: 19   Global Step: 237920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:17,083-Speed 3029.07 samples/sec   Loss 0.8894   LearningRate 0.0002   Epoch: 19   Global Step: 237930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:20,468-Speed 3025.76 samples/sec   Loss 0.8688   LearningRate 0.0002   Epoch: 19   Global Step: 237940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:23,869-Speed 3011.64 samples/sec   Loss 0.9020   LearningRate 0.0002   Epoch: 19   Global Step: 237950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:27,276-Speed 3006.74 samples/sec   Loss 0.8787   LearningRate 0.0002   Epoch: 19   Global Step: 237960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:16:30,642-Speed 3042.75 samples/sec   Loss 0.8739   LearningRate 0.0002   Epoch: 19   Global Step: 237970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:16:34,069-Speed 2988.61 samples/sec   Loss 0.8599   LearningRate 0.0002   Epoch: 19   Global Step: 237980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:37,405-Speed 3070.29 samples/sec   Loss 0.8676   LearningRate 0.0002   Epoch: 19   Global Step: 237990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:40,765-Speed 3048.63 samples/sec   Loss 0.8526   LearningRate 0.0002   Epoch: 19   Global Step: 238000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:44,211-Speed 2972.89 samples/sec   Loss 0.8781   LearningRate 0.0002   Epoch: 19   Global Step: 238010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:47,697-Speed 2938.21 samples/sec   Loss 0.8799   LearningRate 0.0002   Epoch: 19   Global Step: 238020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:51,072-Speed 3034.64 samples/sec   Loss 0.8498   LearningRate 0.0002   Epoch: 19   Global Step: 238030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:54,479-Speed 3006.07 samples/sec   Loss 0.8898   LearningRate 0.0002   Epoch: 19   Global Step: 238040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:16:57,886-Speed 3006.47 samples/sec   Loss 0.8740   LearningRate 0.0002   Epoch: 19   Global Step: 238050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:17:01,282-Speed 3016.14 samples/sec   Loss 0.8709   LearningRate 0.0002   Epoch: 19   Global Step: 238060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:17:04,597-Speed 3090.36 samples/sec   Loss 0.8932   LearningRate 0.0002   Epoch: 19   Global Step: 238070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:17:07,971-Speed 3035.70 samples/sec   Loss 0.8550   LearningRate 0.0002   Epoch: 19   Global Step: 238080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:17:11,403-Speed 2984.22 samples/sec   Loss 0.8895   LearningRate 0.0002   Epoch: 19   Global Step: 238090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:17:14,731-Speed 3078.22 samples/sec   Loss 0.8970   LearningRate 0.0002   Epoch: 19   Global Step: 238100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:17:18,158-Speed 2989.12 samples/sec   Loss 0.8505   LearningRate 0.0002   Epoch: 19   Global Step: 238110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:17:21,518-Speed 3048.23 samples/sec   Loss 0.8881   LearningRate 0.0002   Epoch: 19   Global Step: 238120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:17:24,913-Speed 3017.21 samples/sec   Loss 0.8662   LearningRate 0.0002   Epoch: 19   Global Step: 238130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:17:28,278-Speed 3044.05 samples/sec   Loss 0.8855   LearningRate 0.0002   Epoch: 19   Global Step: 238140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:17:31,661-Speed 3028.43 samples/sec   Loss 0.8581   LearningRate 0.0002   Epoch: 19   Global Step: 238150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:17:34,999-Speed 3068.19 samples/sec   Loss 0.8700   LearningRate 0.0002   Epoch: 19   Global Step: 238160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:17:38,462-Speed 2957.99 samples/sec   Loss 0.8665   LearningRate 0.0002   Epoch: 19   Global Step: 238170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:17:41,929-Speed 2953.80 samples/sec   Loss 0.8569   LearningRate 0.0002   Epoch: 19   Global Step: 238180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:17:45,254-Speed 3081.06 samples/sec   Loss 0.8928   LearningRate 0.0002   Epoch: 19   Global Step: 238190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:17:48,671-Speed 2997.46 samples/sec   Loss 0.8584   LearningRate 0.0002   Epoch: 19   Global Step: 238200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:17:52,032-Speed 3047.48 samples/sec   Loss 0.8440   LearningRate 0.0002   Epoch: 19   Global Step: 238210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:17:55,428-Speed 3016.07 samples/sec   Loss 0.8775   LearningRate 0.0002   Epoch: 19   Global Step: 238220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:17:58,841-Speed 3001.04 samples/sec   Loss 0.8740   LearningRate 0.0002   Epoch: 19   Global Step: 238230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:02,214-Speed 3039.04 samples/sec   Loss 0.8722   LearningRate 0.0002   Epoch: 19   Global Step: 238240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:05,543-Speed 3077.34 samples/sec   Loss 0.8768   LearningRate 0.0002   Epoch: 19   Global Step: 238250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:08,860-Speed 3087.33 samples/sec   Loss 0.8656   LearningRate 0.0002   Epoch: 19   Global Step: 238260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:12,248-Speed 3023.51 samples/sec   Loss 0.8519   LearningRate 0.0002   Epoch: 19   Global Step: 238270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:15,627-Speed 3031.82 samples/sec   Loss 0.8748   LearningRate 0.0002   Epoch: 19   Global Step: 238280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:18,991-Speed 3044.71 samples/sec   Loss 0.8675   LearningRate 0.0002   Epoch: 19   Global Step: 238290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:18:22,378-Speed 3024.08 samples/sec   Loss 0.8978   LearningRate 0.0002   Epoch: 19   Global Step: 238300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:25,731-Speed 3055.45 samples/sec   Loss 0.8728   LearningRate 0.0002   Epoch: 19   Global Step: 238310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:29,100-Speed 3040.26 samples/sec   Loss 0.8835   LearningRate 0.0002   Epoch: 19   Global Step: 238320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:32,541-Speed 2976.40 samples/sec   Loss 0.8799   LearningRate 0.0002   Epoch: 19   Global Step: 238330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:35,966-Speed 2991.33 samples/sec   Loss 0.8441   LearningRate 0.0002   Epoch: 19   Global Step: 238340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:39,432-Speed 2954.57 samples/sec   Loss 0.8337   LearningRate 0.0002   Epoch: 19   Global Step: 238350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:42,818-Speed 3025.45 samples/sec   Loss 0.8568   LearningRate 0.0002   Epoch: 19   Global Step: 238360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:46,182-Speed 3044.97 samples/sec   Loss 0.8022   LearningRate 0.0002   Epoch: 19   Global Step: 238370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:49,521-Speed 3067.25 samples/sec   Loss 0.8685   LearningRate 0.0002   Epoch: 19   Global Step: 238380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:52,890-Speed 3040.83 samples/sec   Loss 0.8395   LearningRate 0.0002   Epoch: 19   Global Step: 238390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:56,246-Speed 3052.03 samples/sec   Loss 0.8853   LearningRate 0.0002   Epoch: 19   Global Step: 238400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:18:59,625-Speed 3031.58 samples/sec   Loss 0.9187   LearningRate 0.0002   Epoch: 19   Global Step: 238410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:02,996-Speed 3038.59 samples/sec   Loss 0.8840   LearningRate 0.0002   Epoch: 19   Global Step: 238420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:06,447-Speed 2968.26 samples/sec   Loss 0.8871   LearningRate 0.0002   Epoch: 19   Global Step: 238430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:09,778-Speed 3074.37 samples/sec   Loss 0.8389   LearningRate 0.0002   Epoch: 19   Global Step: 238440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:13,126-Speed 3060.19 samples/sec   Loss 0.8899   LearningRate 0.0002   Epoch: 19   Global Step: 238450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:16,536-Speed 3003.29 samples/sec   Loss 0.8602   LearningRate 0.0002   Epoch: 19   Global Step: 238460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:20,003-Speed 2954.64 samples/sec   Loss 0.8309   LearningRate 0.0002   Epoch: 19   Global Step: 238470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:23,345-Speed 3064.61 samples/sec   Loss 0.8819   LearningRate 0.0002   Epoch: 19   Global Step: 238480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:26,735-Speed 3021.97 samples/sec   Loss 0.9302   LearningRate 0.0002   Epoch: 19   Global Step: 238490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:30,120-Speed 3025.51 samples/sec   Loss 0.8565   LearningRate 0.0002   Epoch: 19   Global Step: 238500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:19:33,461-Speed 3065.88 samples/sec   Loss 0.8736   LearningRate 0.0002   Epoch: 19   Global Step: 238510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:36,852-Speed 3021.02 samples/sec   Loss 0.8567   LearningRate 0.0002   Epoch: 19   Global Step: 238520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:40,226-Speed 3036.03 samples/sec   Loss 0.8948   LearningRate 0.0002   Epoch: 19   Global Step: 238530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:43,684-Speed 2962.02 samples/sec   Loss 0.8938   LearningRate 0.0002   Epoch: 19   Global Step: 238540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:47,057-Speed 3036.14 samples/sec   Loss 0.8701   LearningRate 0.0002   Epoch: 19   Global Step: 238550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:50,472-Speed 2999.82 samples/sec   Loss 0.8542   LearningRate 0.0002   Epoch: 19   Global Step: 238560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:53,900-Speed 2988.55 samples/sec   Loss 0.8663   LearningRate 0.0002   Epoch: 19   Global Step: 238570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:19:57,295-Speed 3016.73 samples/sec   Loss 0.8493   LearningRate 0.0002   Epoch: 19   Global Step: 238580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:00,665-Speed 3039.50 samples/sec   Loss 0.8181   LearningRate 0.0002   Epoch: 19   Global Step: 238590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:04,091-Speed 2989.99 samples/sec   Loss 0.8822   LearningRate 0.0002   Epoch: 19   Global Step: 238600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:07,518-Speed 2988.46 samples/sec   Loss 0.8268   LearningRate 0.0002   Epoch: 19   Global Step: 238610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:20:10,897-Speed 3031.57 samples/sec   Loss 0.8938   LearningRate 0.0002   Epoch: 19   Global Step: 238620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:14,274-Speed 3034.20 samples/sec   Loss 0.8854   LearningRate 0.0002   Epoch: 19   Global Step: 238630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:17,665-Speed 3020.01 samples/sec   Loss 0.8623   LearningRate 0.0002   Epoch: 19   Global Step: 238640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:21,127-Speed 2958.56 samples/sec   Loss 0.8565   LearningRate 0.0002   Epoch: 19   Global Step: 238650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:24,514-Speed 3024.72 samples/sec   Loss 0.8741   LearningRate 0.0002   Epoch: 19   Global Step: 238660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:27,887-Speed 3035.76 samples/sec   Loss 0.8644   LearningRate 0.0002   Epoch: 19   Global Step: 238670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:31,304-Speed 2997.85 samples/sec   Loss 0.9330   LearningRate 0.0002   Epoch: 19   Global Step: 238680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:34,666-Speed 3047.10 samples/sec   Loss 0.8780   LearningRate 0.0002   Epoch: 19   Global Step: 238690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:38,114-Speed 2970.37 samples/sec   Loss 0.8926   LearningRate 0.0002   Epoch: 19   Global Step: 238700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:41,530-Speed 2998.42 samples/sec   Loss 0.8752   LearningRate 0.0002   Epoch: 19   Global Step: 238710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:44,969-Speed 2978.49 samples/sec   Loss 0.8550   LearningRate 0.0002   Epoch: 19   Global Step: 238720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:20:48,293-Speed 3082.44 samples/sec   Loss 0.8562   LearningRate 0.0002   Epoch: 19   Global Step: 238730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:51,781-Speed 2936.84 samples/sec   Loss 0.8835   LearningRate 0.0002   Epoch: 19   Global Step: 238740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:20:55,194-Speed 3000.46 samples/sec   Loss 0.8589   LearningRate 0.0002   Epoch: 19   Global Step: 238750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:20:58,630-Speed 2981.37 samples/sec   Loss 0.8680   LearningRate 0.0002   Epoch: 19   Global Step: 238760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:21:02,020-Speed 3021.35 samples/sec   Loss 0.8772   LearningRate 0.0002   Epoch: 19   Global Step: 238770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:21:05,405-Speed 3030.70 samples/sec   Loss 0.8819   LearningRate 0.0002   Epoch: 19   Global Step: 238780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:21:08,870-Speed 2956.42 samples/sec   Loss 0.8614   LearningRate 0.0002   Epoch: 19   Global Step: 238790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:21:12,299-Speed 2987.17 samples/sec   Loss 0.8788   LearningRate 0.0001   Epoch: 19   Global Step: 238800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:21:15,693-Speed 3018.11 samples/sec   Loss 0.8807   LearningRate 0.0001   Epoch: 19   Global Step: 238810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:21:19,026-Speed 3072.83 samples/sec   Loss 0.8969   LearningRate 0.0001   Epoch: 19   Global Step: 238820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:21:22,446-Speed 2995.30 samples/sec   Loss 0.8609   LearningRate 0.0001   Epoch: 19   Global Step: 238830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:21:25,828-Speed 3028.31 samples/sec   Loss 0.8514   LearningRate 0.0001   Epoch: 19   Global Step: 238840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:21:29,295-Speed 2954.28 samples/sec   Loss 0.8598   LearningRate 0.0001   Epoch: 19   Global Step: 238850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:21:32,739-Speed 2974.25 samples/sec   Loss 0.8597   LearningRate 0.0001   Epoch: 19   Global Step: 238860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:21:36,162-Speed 2993.16 samples/sec   Loss 0.8612   LearningRate 0.0001   Epoch: 19   Global Step: 238870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:21:39,660-Speed 2928.55 samples/sec   Loss 0.8135   LearningRate 0.0001   Epoch: 19   Global Step: 238880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:21:43,172-Speed 2916.27 samples/sec   Loss 0.8550   LearningRate 0.0001   Epoch: 19   Global Step: 238890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:21:46,654-Speed 2941.69 samples/sec   Loss 0.8526   LearningRate 0.0001   Epoch: 19   Global Step: 238900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:21:50,107-Speed 2966.00 samples/sec   Loss 0.8547   LearningRate 0.0001   Epoch: 19   Global Step: 238910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:21:53,552-Speed 2973.93 samples/sec   Loss 0.8784   LearningRate 0.0001   Epoch: 19   Global Step: 238920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:21:56,909-Speed 3050.40 samples/sec   Loss 0.8781   LearningRate 0.0001   Epoch: 19   Global Step: 238930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:00,326-Speed 2997.71 samples/sec   Loss 0.8756   LearningRate 0.0001   Epoch: 19   Global Step: 238940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:03,759-Speed 2983.86 samples/sec   Loss 0.8320   LearningRate 0.0001   Epoch: 19   Global Step: 238950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:22:07,235-Speed 2946.46 samples/sec   Loss 0.8678   LearningRate 0.0001   Epoch: 19   Global Step: 238960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:10,566-Speed 3074.94 samples/sec   Loss 0.8678   LearningRate 0.0001   Epoch: 19   Global Step: 238970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:13,928-Speed 3047.32 samples/sec   Loss 0.8493   LearningRate 0.0001   Epoch: 19   Global Step: 238980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:17,297-Speed 3040.67 samples/sec   Loss 0.8353   LearningRate 0.0001   Epoch: 19   Global Step: 238990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:20,705-Speed 3005.33 samples/sec   Loss 0.8467   LearningRate 0.0001   Epoch: 19   Global Step: 239000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:24,095-Speed 3021.93 samples/sec   Loss 0.8934   LearningRate 0.0001   Epoch: 19   Global Step: 239010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:27,506-Speed 3002.45 samples/sec   Loss 0.8574   LearningRate 0.0001   Epoch: 19   Global Step: 239020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:30,930-Speed 2991.89 samples/sec   Loss 0.8527   LearningRate 0.0001   Epoch: 19   Global Step: 239030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:34,327-Speed 3014.91 samples/sec   Loss 0.9084   LearningRate 0.0001   Epoch: 19   Global Step: 239040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:37,692-Speed 3044.25 samples/sec   Loss 0.8823   LearningRate 0.0001   Epoch: 19   Global Step: 239050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:41,103-Speed 3002.90 samples/sec   Loss 0.8853   LearningRate 0.0001   Epoch: 19   Global Step: 239060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:44,468-Speed 3044.71 samples/sec   Loss 0.8807   LearningRate 0.0001   Epoch: 19   Global Step: 239070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:47,866-Speed 3014.03 samples/sec   Loss 0.8837   LearningRate 0.0001   Epoch: 19   Global Step: 239080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:51,237-Speed 3038.37 samples/sec   Loss 0.8846   LearningRate 0.0001   Epoch: 19   Global Step: 239090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:54,702-Speed 2956.35 samples/sec   Loss 0.8643   LearningRate 0.0001   Epoch: 19   Global Step: 239100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:22:58,067-Speed 3043.13 samples/sec   Loss 0.8309   LearningRate 0.0001   Epoch: 19   Global Step: 239110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:23:01,404-Speed 3069.84 samples/sec   Loss 0.8801   LearningRate 0.0001   Epoch: 19   Global Step: 239120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:23:04,726-Speed 3083.69 samples/sec   Loss 0.8375   LearningRate 0.0001   Epoch: 19   Global Step: 239130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:23:08,004-Speed 3124.93 samples/sec   Loss 0.8879   LearningRate 0.0001   Epoch: 19   Global Step: 239140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:23:11,363-Speed 3048.81 samples/sec   Loss 0.8692   LearningRate 0.0001   Epoch: 19   Global Step: 239150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:23:14,766-Speed 3010.71 samples/sec   Loss 0.8761   LearningRate 0.0001   Epoch: 19   Global Step: 239160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:23:18,140-Speed 3035.09 samples/sec   Loss 0.8713   LearningRate 0.0001   Epoch: 19   Global Step: 239170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:23:21,497-Speed 3051.86 samples/sec   Loss 0.8730   LearningRate 0.0001   Epoch: 19   Global Step: 239180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:23:24,843-Speed 3060.69 samples/sec   Loss 0.8880   LearningRate 0.0001   Epoch: 19   Global Step: 239190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:23:28,161-Speed 3087.07 samples/sec   Loss 0.8733   LearningRate 0.0001   Epoch: 19   Global Step: 239200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:23:31,556-Speed 3016.85 samples/sec   Loss 0.8996   LearningRate 0.0001   Epoch: 19   Global Step: 239210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:23:34,906-Speed 3058.07 samples/sec   Loss 0.8353   LearningRate 0.0001   Epoch: 19   Global Step: 239220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:23:38,279-Speed 3035.86 samples/sec   Loss 0.8410   LearningRate 0.0001   Epoch: 19   Global Step: 239230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:23:41,667-Speed 3023.97 samples/sec   Loss 0.8736   LearningRate 0.0001   Epoch: 19   Global Step: 239240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:23:44,987-Speed 3085.45 samples/sec   Loss 0.8872   LearningRate 0.0001   Epoch: 19   Global Step: 239250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:23:48,352-Speed 3043.79 samples/sec   Loss 0.8775   LearningRate 0.0001   Epoch: 19   Global Step: 239260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:23:51,773-Speed 2994.70 samples/sec   Loss 0.8291   LearningRate 0.0001   Epoch: 19   Global Step: 239270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:23:55,189-Speed 2998.39 samples/sec   Loss 0.8610   LearningRate 0.0001   Epoch: 19   Global Step: 239280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:23:58,583-Speed 3017.39 samples/sec   Loss 0.8572   LearningRate 0.0001   Epoch: 19   Global Step: 239290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:24:02,018-Speed 2982.06 samples/sec   Loss 0.8717   LearningRate 0.0001   Epoch: 19   Global Step: 239300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:24:05,464-Speed 2972.52 samples/sec   Loss 0.8862   LearningRate 0.0001   Epoch: 19   Global Step: 239310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:24:08,792-Speed 3077.72 samples/sec   Loss 0.8945   LearningRate 0.0001   Epoch: 19   Global Step: 239320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:24:12,223-Speed 2985.16 samples/sec   Loss 0.9126   LearningRate 0.0001   Epoch: 19   Global Step: 239330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:24:15,657-Speed 2991.82 samples/sec   Loss 0.8808   LearningRate 0.0001   Epoch: 19   Global Step: 239340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:24:19,026-Speed 3040.33 samples/sec   Loss 0.9081   LearningRate 0.0001   Epoch: 19   Global Step: 239350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:24:22,409-Speed 3027.56 samples/sec   Loss 0.9038   LearningRate 0.0001   Epoch: 19   Global Step: 239360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:24:25,718-Speed 3095.67 samples/sec   Loss 0.8652   LearningRate 0.0001   Epoch: 19   Global Step: 239370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:24:29,110-Speed 3019.85 samples/sec   Loss 0.8802   LearningRate 0.0001   Epoch: 19   Global Step: 239380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:24:32,429-Speed 3086.26 samples/sec   Loss 0.8358   LearningRate 0.0001   Epoch: 19   Global Step: 239390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:24:35,742-Speed 3092.69 samples/sec   Loss 0.8790   LearningRate 0.0001   Epoch: 19   Global Step: 239400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:24:39,235-Speed 2932.05 samples/sec   Loss 0.8477   LearningRate 0.0001   Epoch: 19   Global Step: 239410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:24:42,673-Speed 2979.75 samples/sec   Loss 0.8254   LearningRate 0.0001   Epoch: 19   Global Step: 239420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:24:46,002-Speed 3076.60 samples/sec   Loss 0.8665   LearningRate 0.0001   Epoch: 19   Global Step: 239430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:24:49,399-Speed 3015.22 samples/sec   Loss 0.8730   LearningRate 0.0001   Epoch: 19   Global Step: 239440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:24:52,733-Speed 3072.43 samples/sec   Loss 0.8410   LearningRate 0.0001   Epoch: 19   Global Step: 239450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:24:56,096-Speed 3045.29 samples/sec   Loss 0.8874   LearningRate 0.0001   Epoch: 19   Global Step: 239460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:24:59,556-Speed 2960.43 samples/sec   Loss 0.9084   LearningRate 0.0001   Epoch: 19   Global Step: 239470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:25:02,941-Speed 3025.80 samples/sec   Loss 0.8472   LearningRate 0.0001   Epoch: 19   Global Step: 239480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:25:06,287-Speed 3061.74 samples/sec   Loss 0.8985   LearningRate 0.0001   Epoch: 19   Global Step: 239490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:25:09,673-Speed 3024.54 samples/sec   Loss 0.8854   LearningRate 0.0001   Epoch: 19   Global Step: 239500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:25:13,061-Speed 3023.27 samples/sec   Loss 0.8809   LearningRate 0.0001   Epoch: 19   Global Step: 239510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:25:16,484-Speed 2992.46 samples/sec   Loss 0.8889   LearningRate 0.0001   Epoch: 19   Global Step: 239520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:25:19,899-Speed 2999.29 samples/sec   Loss 0.8716   LearningRate 0.0001   Epoch: 19   Global Step: 239530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:25:23,319-Speed 2994.56 samples/sec   Loss 0.8426   LearningRate 0.0001   Epoch: 19   Global Step: 239540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:25:26,767-Speed 2971.44 samples/sec   Loss 0.9065   LearningRate 0.0001   Epoch: 19   Global Step: 239550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:25:30,130-Speed 3045.50 samples/sec   Loss 0.8468   LearningRate 0.0001   Epoch: 19   Global Step: 239560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:25:33,556-Speed 2990.05 samples/sec   Loss 0.8696   LearningRate 0.0001   Epoch: 19   Global Step: 239570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:25:36,908-Speed 3054.99 samples/sec   Loss 0.8493   LearningRate 0.0001   Epoch: 19   Global Step: 239580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:25:40,232-Speed 3082.54 samples/sec   Loss 0.8426   LearningRate 0.0001   Epoch: 19   Global Step: 239590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:25:43,638-Speed 3006.58 samples/sec   Loss 0.8891   LearningRate 0.0001   Epoch: 19   Global Step: 239600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:25:46,980-Speed 3064.87 samples/sec   Loss 0.8454   LearningRate 0.0001   Epoch: 19   Global Step: 239610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:25:50,356-Speed 3034.74 samples/sec   Loss 0.9104   LearningRate 0.0001   Epoch: 19   Global Step: 239620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:25:53,768-Speed 3002.44 samples/sec   Loss 0.8919   LearningRate 0.0001   Epoch: 19   Global Step: 239630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:25:57,127-Speed 3049.06 samples/sec   Loss 0.9037   LearningRate 0.0001   Epoch: 19   Global Step: 239640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:26:00,472-Speed 3062.30 samples/sec   Loss 0.8668   LearningRate 0.0001   Epoch: 19   Global Step: 239650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:26:03,914-Speed 2976.06 samples/sec   Loss 0.8996   LearningRate 0.0001   Epoch: 19   Global Step: 239660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:26:07,295-Speed 3029.35 samples/sec   Loss 0.8566   LearningRate 0.0001   Epoch: 19   Global Step: 239670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:26:10,719-Speed 2991.30 samples/sec   Loss 0.8450   LearningRate 0.0001   Epoch: 19   Global Step: 239680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:26:14,131-Speed 3002.17 samples/sec   Loss 0.8498   LearningRate 0.0001   Epoch: 19   Global Step: 239690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:26:17,544-Speed 3001.32 samples/sec   Loss 0.8892   LearningRate 0.0001   Epoch: 19   Global Step: 239700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:26:20,980-Speed 2980.32 samples/sec   Loss 0.8688   LearningRate 0.0001   Epoch: 19   Global Step: 239710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:26:24,426-Speed 2972.61 samples/sec   Loss 0.8770   LearningRate 0.0001   Epoch: 19   Global Step: 239720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:26:27,835-Speed 3004.58 samples/sec   Loss 0.8499   LearningRate 0.0001   Epoch: 19   Global Step: 239730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:26:31,271-Speed 2981.25 samples/sec   Loss 0.9033   LearningRate 0.0001   Epoch: 19   Global Step: 239740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:26:34,697-Speed 2989.36 samples/sec   Loss 0.8658   LearningRate 0.0001   Epoch: 19   Global Step: 239750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:26:38,047-Speed 3057.84 samples/sec   Loss 0.8682   LearningRate 0.0001   Epoch: 19   Global Step: 239760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:26:41,451-Speed 3009.02 samples/sec   Loss 0.8709   LearningRate 0.0001   Epoch: 19   Global Step: 239770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:26:44,838-Speed 3023.84 samples/sec   Loss 0.8677   LearningRate 0.0001   Epoch: 19   Global Step: 239780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:26:48,268-Speed 2986.00 samples/sec   Loss 0.8511   LearningRate 0.0001   Epoch: 19   Global Step: 239790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:26:51,707-Speed 2978.83 samples/sec   Loss 0.9061   LearningRate 0.0001   Epoch: 19   Global Step: 239800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:26:55,110-Speed 3009.89 samples/sec   Loss 0.8540   LearningRate 0.0001   Epoch: 19   Global Step: 239810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:26:58,556-Speed 2972.61 samples/sec   Loss 0.8914   LearningRate 0.0001   Epoch: 19   Global Step: 239820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:27:01,935-Speed 3030.81 samples/sec   Loss 0.8936   LearningRate 0.0001   Epoch: 19   Global Step: 239830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:27:05,307-Speed 3037.50 samples/sec   Loss 0.8812   LearningRate 0.0001   Epoch: 19   Global Step: 239840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:27:08,759-Speed 2967.28 samples/sec   Loss 0.8849   LearningRate 0.0001   Epoch: 19   Global Step: 239850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:27:12,232-Speed 2949.73 samples/sec   Loss 0.8695   LearningRate 0.0001   Epoch: 19   Global Step: 239860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:27:15,553-Speed 3083.74 samples/sec   Loss 0.8975   LearningRate 0.0001   Epoch: 19   Global Step: 239870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:27:18,991-Speed 2979.43 samples/sec   Loss 0.8549   LearningRate 0.0001   Epoch: 19   Global Step: 239880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:27:22,345-Speed 3053.82 samples/sec   Loss 0.8454   LearningRate 0.0001   Epoch: 19   Global Step: 239890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:27:25,742-Speed 3014.85 samples/sec   Loss 0.8770   LearningRate 0.0001   Epoch: 19   Global Step: 239900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:27:29,217-Speed 2947.91 samples/sec   Loss 0.8469   LearningRate 0.0001   Epoch: 19   Global Step: 239910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:27:32,645-Speed 2988.12 samples/sec   Loss 0.8631   LearningRate 0.0001   Epoch: 19   Global Step: 239920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:27:36,091-Speed 2972.41 samples/sec   Loss 0.8766   LearningRate 0.0001   Epoch: 19   Global Step: 239930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:27:39,455-Speed 3044.99 samples/sec   Loss 0.8638   LearningRate 0.0001   Epoch: 19   Global Step: 239940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:27:42,822-Speed 3042.25 samples/sec   Loss 0.8546   LearningRate 0.0001   Epoch: 19   Global Step: 239950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:27:46,189-Speed 3042.10 samples/sec   Loss 0.8978   LearningRate 0.0001   Epoch: 19   Global Step: 239960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:27:49,605-Speed 2998.38 samples/sec   Loss 0.8486   LearningRate 0.0001   Epoch: 19   Global Step: 239970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:27:53,036-Speed 2985.23 samples/sec   Loss 0.9117   LearningRate 0.0001   Epoch: 19   Global Step: 239980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:27:56,436-Speed 3012.36 samples/sec   Loss 0.8449   LearningRate 0.0001   Epoch: 19   Global Step: 239990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:27:59,914-Speed 2945.57 samples/sec   Loss 0.8958   LearningRate 0.0001   Epoch: 19   Global Step: 240000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:28:03,363-Speed 2969.27 samples/sec   Loss 0.8965   LearningRate 0.0001   Epoch: 19   Global Step: 240010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:28:06,776-Speed 3001.71 samples/sec   Loss 0.8704   LearningRate 0.0001   Epoch: 19   Global Step: 240020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:28:10,128-Speed 3055.12 samples/sec   Loss 0.8536   LearningRate 0.0001   Epoch: 19   Global Step: 240030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:28:13,523-Speed 3016.93 samples/sec   Loss 0.8664   LearningRate 0.0001   Epoch: 19   Global Step: 240040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:28:16,930-Speed 3007.93 samples/sec   Loss 0.8787   LearningRate 0.0001   Epoch: 19   Global Step: 240050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:28:20,354-Speed 2991.33 samples/sec   Loss 0.9036   LearningRate 0.0001   Epoch: 19   Global Step: 240060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:28:23,842-Speed 2936.63 samples/sec   Loss 0.8386   LearningRate 0.0001   Epoch: 19   Global Step: 240070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:28:27,247-Speed 3008.42 samples/sec   Loss 0.8727   LearningRate 0.0001   Epoch: 19   Global Step: 240080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:28:30,569-Speed 3082.85 samples/sec   Loss 0.8532   LearningRate 0.0001   Epoch: 19   Global Step: 240090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:28:33,927-Speed 3050.86 samples/sec   Loss 0.8813   LearningRate 0.0001   Epoch: 19   Global Step: 240100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:28:37,274-Speed 3060.05 samples/sec   Loss 0.8862   LearningRate 0.0001   Epoch: 19   Global Step: 240110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:28:40,645-Speed 3038.42 samples/sec   Loss 0.8374   LearningRate 0.0001   Epoch: 19   Global Step: 240120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:28:43,994-Speed 3058.12 samples/sec   Loss 0.8741   LearningRate 0.0001   Epoch: 19   Global Step: 240130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:28:47,334-Speed 3066.77 samples/sec   Loss 0.8548   LearningRate 0.0001   Epoch: 19   Global Step: 240140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:28:50,748-Speed 3000.48 samples/sec   Loss 0.8948   LearningRate 0.0001   Epoch: 19   Global Step: 240150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:28:54,215-Speed 2954.22 samples/sec   Loss 0.8768   LearningRate 0.0001   Epoch: 19   Global Step: 240160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:28:57,668-Speed 2966.26 samples/sec   Loss 0.8310   LearningRate 0.0001   Epoch: 19   Global Step: 240170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:29:01,109-Speed 2976.49 samples/sec   Loss 0.8251   LearningRate 0.0001   Epoch: 19   Global Step: 240180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:29:04,520-Speed 3003.15 samples/sec   Loss 0.8698   LearningRate 0.0001   Epoch: 19   Global Step: 240190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:29:07,909-Speed 3022.37 samples/sec   Loss 0.8354   LearningRate 0.0001   Epoch: 19   Global Step: 240200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:29:11,239-Speed 3075.81 samples/sec   Loss 0.9171   LearningRate 0.0001   Epoch: 19   Global Step: 240210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:29:14,618-Speed 3031.64 samples/sec   Loss 0.8684   LearningRate 0.0001   Epoch: 19   Global Step: 240220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:29:17,977-Speed 3049.50 samples/sec   Loss 0.8588   LearningRate 0.0001   Epoch: 19   Global Step: 240230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:29:21,342-Speed 3043.29 samples/sec   Loss 0.8577   LearningRate 0.0001   Epoch: 19   Global Step: 240240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:29:24,702-Speed 3048.65 samples/sec   Loss 0.8425   LearningRate 0.0001   Epoch: 19   Global Step: 240250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:29:28,124-Speed 2992.98 samples/sec   Loss 0.8684   LearningRate 0.0001   Epoch: 19   Global Step: 240260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:29:31,613-Speed 2935.89 samples/sec   Loss 0.8760   LearningRate 0.0001   Epoch: 19   Global Step: 240270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:29:35,056-Speed 2975.28 samples/sec   Loss 0.9070   LearningRate 0.0001   Epoch: 19   Global Step: 240280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:29:38,514-Speed 2962.83 samples/sec   Loss 0.8601   LearningRate 0.0001   Epoch: 19   Global Step: 240290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:29:41,924-Speed 3003.75 samples/sec   Loss 0.9272   LearningRate 0.0001   Epoch: 19   Global Step: 240300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:29:45,298-Speed 3035.69 samples/sec   Loss 0.8899   LearningRate 0.0001   Epoch: 19   Global Step: 240310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:29:48,650-Speed 3056.29 samples/sec   Loss 0.8723   LearningRate 0.0001   Epoch: 19   Global Step: 240320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:29:52,002-Speed 3055.55 samples/sec   Loss 0.8838   LearningRate 0.0001   Epoch: 19   Global Step: 240330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:29:55,479-Speed 2945.91 samples/sec   Loss 0.8701   LearningRate 0.0001   Epoch: 19   Global Step: 240340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:29:58,862-Speed 3027.63 samples/sec   Loss 0.8583   LearningRate 0.0001   Epoch: 19   Global Step: 240350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:02,211-Speed 3058.62 samples/sec   Loss 0.8947   LearningRate 0.0001   Epoch: 19   Global Step: 240360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:05,596-Speed 3025.62 samples/sec   Loss 0.8597   LearningRate 0.0001   Epoch: 19   Global Step: 240370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:09,007-Speed 3003.71 samples/sec   Loss 0.8490   LearningRate 0.0001   Epoch: 19   Global Step: 240380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:12,405-Speed 3013.66 samples/sec   Loss 0.8769   LearningRate 0.0001   Epoch: 19   Global Step: 240390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:15,820-Speed 2999.17 samples/sec   Loss 0.8799   LearningRate 0.0001   Epoch: 19   Global Step: 240400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:30:19,200-Speed 3031.08 samples/sec   Loss 0.8495   LearningRate 0.0001   Epoch: 19   Global Step: 240410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:22,640-Speed 2976.87 samples/sec   Loss 0.8787   LearningRate 0.0001   Epoch: 19   Global Step: 240420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:25,999-Speed 3050.28 samples/sec   Loss 0.8844   LearningRate 0.0001   Epoch: 19   Global Step: 240430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:29,369-Speed 3039.04 samples/sec   Loss 0.8561   LearningRate 0.0001   Epoch: 19   Global Step: 240440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:32,809-Speed 2977.94 samples/sec   Loss 0.8915   LearningRate 0.0001   Epoch: 19   Global Step: 240450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:36,128-Speed 3085.56 samples/sec   Loss 0.8805   LearningRate 0.0001   Epoch: 19   Global Step: 240460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:39,502-Speed 3035.81 samples/sec   Loss 0.8738   LearningRate 0.0001   Epoch: 19   Global Step: 240470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:42,947-Speed 2973.46 samples/sec   Loss 0.8193   LearningRate 0.0001   Epoch: 19   Global Step: 240480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:46,300-Speed 3054.67 samples/sec   Loss 0.8773   LearningRate 0.0001   Epoch: 19   Global Step: 240490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:49,729-Speed 2987.26 samples/sec   Loss 0.8784   LearningRate 0.0001   Epoch: 19   Global Step: 240500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:53,114-Speed 3025.67 samples/sec   Loss 0.8536   LearningRate 0.0001   Epoch: 19   Global Step: 240510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:30:56,443-Speed 3077.32 samples/sec   Loss 0.8357   LearningRate 0.0001   Epoch: 19   Global Step: 240520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:30:59,862-Speed 2995.92 samples/sec   Loss 0.8469   LearningRate 0.0001   Epoch: 19   Global Step: 240530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:31:03,249-Speed 3024.04 samples/sec   Loss 0.8777   LearningRate 0.0001   Epoch: 19   Global Step: 240540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:31:06,622-Speed 3036.81 samples/sec   Loss 0.9257   LearningRate 0.0001   Epoch: 19   Global Step: 240550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:31:09,978-Speed 3051.68 samples/sec   Loss 0.8858   LearningRate 0.0001   Epoch: 19   Global Step: 240560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:31:13,382-Speed 3009.44 samples/sec   Loss 0.9325   LearningRate 0.0001   Epoch: 19   Global Step: 240570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:31:16,812-Speed 2985.76 samples/sec   Loss 0.8409   LearningRate 0.0001   Epoch: 19   Global Step: 240580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:31:20,218-Speed 3007.75 samples/sec   Loss 0.8513   LearningRate 0.0001   Epoch: 19   Global Step: 240590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:31:23,667-Speed 2969.37 samples/sec   Loss 0.8505   LearningRate 0.0001   Epoch: 19   Global Step: 240600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:31:27,077-Speed 3004.14 samples/sec   Loss 0.8988   LearningRate 0.0001   Epoch: 19   Global Step: 240610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:31:30,427-Speed 3057.45 samples/sec   Loss 0.8276   LearningRate 0.0001   Epoch: 19   Global Step: 240620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:31:33,816-Speed 3022.33 samples/sec   Loss 0.8549   LearningRate 0.0001   Epoch: 19   Global Step: 240630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:31:37,155-Speed 3067.42 samples/sec   Loss 0.8469   LearningRate 0.0001   Epoch: 19   Global Step: 240640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:31:40,531-Speed 3034.61 samples/sec   Loss 0.8679   LearningRate 0.0001   Epoch: 19   Global Step: 240650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:31:43,894-Speed 3045.82 samples/sec   Loss 0.8473   LearningRate 0.0001   Epoch: 19   Global Step: 240660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:31:47,295-Speed 3011.54 samples/sec   Loss 0.8539   LearningRate 0.0001   Epoch: 19   Global Step: 240670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:31:50,679-Speed 3027.14 samples/sec   Loss 0.8785   LearningRate 0.0001   Epoch: 19   Global Step: 240680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:31:54,013-Speed 3072.37 samples/sec   Loss 0.8740   LearningRate 0.0001   Epoch: 19   Global Step: 240690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:31:57,359-Speed 3061.42 samples/sec   Loss 0.9034   LearningRate 0.0001   Epoch: 19   Global Step: 240700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:32:00,802-Speed 2974.89 samples/sec   Loss 0.9036   LearningRate 0.0001   Epoch: 19   Global Step: 240710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:32:04,139-Speed 3069.93 samples/sec   Loss 0.8750   LearningRate 0.0001   Epoch: 19   Global Step: 240720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:32:07,539-Speed 3012.37 samples/sec   Loss 0.8871   LearningRate 0.0001   Epoch: 19   Global Step: 240730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:32:10,934-Speed 3017.30 samples/sec   Loss 0.8814   LearningRate 0.0001   Epoch: 19   Global Step: 240740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:32:14,367-Speed 2983.32 samples/sec   Loss 0.8765   LearningRate 0.0001   Epoch: 19   Global Step: 240750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:32:17,713-Speed 3063.16 samples/sec   Loss 0.8540   LearningRate 0.0001   Epoch: 19   Global Step: 240760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:32:21,111-Speed 3014.48 samples/sec   Loss 0.8331   LearningRate 0.0001   Epoch: 19   Global Step: 240770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:32:24,491-Speed 3030.10 samples/sec   Loss 0.8681   LearningRate 0.0001   Epoch: 19   Global Step: 240780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:32:27,790-Speed 3105.32 samples/sec   Loss 0.8797   LearningRate 0.0001   Epoch: 19   Global Step: 240790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:32:31,171-Speed 3029.18 samples/sec   Loss 0.8478   LearningRate 0.0001   Epoch: 19   Global Step: 240800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:32:34,497-Speed 3079.94 samples/sec   Loss 0.8720   LearningRate 0.0001   Epoch: 19   Global Step: 240810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:32:37,870-Speed 3036.52 samples/sec   Loss 0.8656   LearningRate 0.0001   Epoch: 19   Global Step: 240820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:32:41,208-Speed 3068.27 samples/sec   Loss 0.8420   LearningRate 0.0001   Epoch: 19   Global Step: 240830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:32:44,581-Speed 3037.01 samples/sec   Loss 0.8965   LearningRate 0.0001   Epoch: 19   Global Step: 240840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:32:47,968-Speed 3024.35 samples/sec   Loss 0.8969   LearningRate 0.0001   Epoch: 19   Global Step: 240850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:32:51,307-Speed 3067.15 samples/sec   Loss 0.8421   LearningRate 0.0001   Epoch: 19   Global Step: 240860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:32:54,631-Speed 3081.35 samples/sec   Loss 0.8682   LearningRate 0.0001   Epoch: 19   Global Step: 240870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:32:57,960-Speed 3077.24 samples/sec   Loss 0.8904   LearningRate 0.0001   Epoch: 19   Global Step: 240880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:33:01,374-Speed 3000.16 samples/sec   Loss 0.8776   LearningRate 0.0001   Epoch: 19   Global Step: 240890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:33:04,725-Speed 3057.32 samples/sec   Loss 0.8832   LearningRate 0.0001   Epoch: 19   Global Step: 240900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:33:08,140-Speed 2999.16 samples/sec   Loss 0.8555   LearningRate 0.0001   Epoch: 19   Global Step: 240910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:33:11,585-Speed 2973.83 samples/sec   Loss 0.8592   LearningRate 0.0001   Epoch: 19   Global Step: 240920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:33:15,043-Speed 2961.67 samples/sec   Loss 0.8610   LearningRate 0.0001   Epoch: 19   Global Step: 240930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:33:18,475-Speed 2984.66 samples/sec   Loss 0.8950   LearningRate 0.0001   Epoch: 19   Global Step: 240940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:33:21,892-Speed 2997.09 samples/sec   Loss 0.9010   LearningRate 0.0001   Epoch: 19   Global Step: 240950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:33:25,287-Speed 3017.65 samples/sec   Loss 0.8693   LearningRate 0.0001   Epoch: 19   Global Step: 240960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:33:28,681-Speed 3018.26 samples/sec   Loss 0.8443   LearningRate 0.0001   Epoch: 19   Global Step: 240970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:33:32,040-Speed 3049.47 samples/sec   Loss 0.8708   LearningRate 0.0001   Epoch: 19   Global Step: 240980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:33:35,376-Speed 3070.60 samples/sec   Loss 0.8363   LearningRate 0.0001   Epoch: 19   Global Step: 240990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:33:38,777-Speed 3011.68 samples/sec   Loss 0.8903   LearningRate 0.0001   Epoch: 19   Global Step: 241000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:33:42,219-Speed 2975.80 samples/sec   Loss 0.8858   LearningRate 0.0001   Epoch: 19   Global Step: 241010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:33:45,528-Speed 3095.59 samples/sec   Loss 0.8611   LearningRate 0.0001   Epoch: 19   Global Step: 241020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:33:48,888-Speed 3047.91 samples/sec   Loss 0.8644   LearningRate 0.0001   Epoch: 19   Global Step: 241030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:33:52,405-Speed 2912.75 samples/sec   Loss 0.8945   LearningRate 0.0001   Epoch: 19   Global Step: 241040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:33:55,761-Speed 3051.61 samples/sec   Loss 0.8553   LearningRate 0.0001   Epoch: 19   Global Step: 241050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:33:59,077-Speed 3089.29 samples/sec   Loss 0.9115   LearningRate 0.0001   Epoch: 19   Global Step: 241060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:02,411-Speed 3072.18 samples/sec   Loss 0.8565   LearningRate 0.0001   Epoch: 19   Global Step: 241070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:05,800-Speed 3021.82 samples/sec   Loss 0.8826   LearningRate 0.0001   Epoch: 19   Global Step: 241080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:09,217-Speed 2998.15 samples/sec   Loss 0.9022   LearningRate 0.0001   Epoch: 19   Global Step: 241090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:12,558-Speed 3065.32 samples/sec   Loss 0.9395   LearningRate 0.0001   Epoch: 19   Global Step: 241100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:15,920-Speed 3047.02 samples/sec   Loss 0.8648   LearningRate 0.0001   Epoch: 19   Global Step: 241110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:19,330-Speed 3004.14 samples/sec   Loss 0.9125   LearningRate 0.0001   Epoch: 19   Global Step: 241120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:22,679-Speed 3058.17 samples/sec   Loss 0.8857   LearningRate 0.0001   Epoch: 19   Global Step: 241130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:34:26,017-Speed 3068.82 samples/sec   Loss 0.8360   LearningRate 0.0001   Epoch: 19   Global Step: 241140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:29,426-Speed 3004.57 samples/sec   Loss 0.8791   LearningRate 0.0001   Epoch: 19   Global Step: 241150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:32,808-Speed 3028.23 samples/sec   Loss 0.9129   LearningRate 0.0001   Epoch: 19   Global Step: 241160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:36,166-Speed 3050.71 samples/sec   Loss 0.8341   LearningRate 0.0001   Epoch: 19   Global Step: 241170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:39,508-Speed 3065.07 samples/sec   Loss 0.9102   LearningRate 0.0001   Epoch: 19   Global Step: 241180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:42,905-Speed 3015.10 samples/sec   Loss 0.8942   LearningRate 0.0001   Epoch: 19   Global Step: 241190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:46,293-Speed 3022.97 samples/sec   Loss 0.8504   LearningRate 0.0001   Epoch: 19   Global Step: 241200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:49,611-Speed 3087.46 samples/sec   Loss 0.8682   LearningRate 0.0001   Epoch: 19   Global Step: 241210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:53,001-Speed 3021.43 samples/sec   Loss 0.9095   LearningRate 0.0001   Epoch: 19   Global Step: 241220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:56,498-Speed 2929.03 samples/sec   Loss 0.9153   LearningRate 0.0001   Epoch: 19   Global Step: 241230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:34:59,897-Speed 3014.16 samples/sec   Loss 0.8445   LearningRate 0.0001   Epoch: 19   Global Step: 241240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:35:03,318-Speed 2993.84 samples/sec   Loss 0.8687   LearningRate 0.0001   Epoch: 19   Global Step: 241250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:35:06,741-Speed 2993.03 samples/sec   Loss 0.8672   LearningRate 0.0001   Epoch: 19   Global Step: 241260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:35:10,048-Speed 3097.06 samples/sec   Loss 0.8510   LearningRate 0.0001   Epoch: 19   Global Step: 241270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:35:13,380-Speed 3074.95 samples/sec   Loss 0.8217   LearningRate 0.0001   Epoch: 19   Global Step: 241280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:35:16,752-Speed 3037.49 samples/sec   Loss 0.8560   LearningRate 0.0001   Epoch: 19   Global Step: 241290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:35:20,092-Speed 3065.94 samples/sec   Loss 0.8715   LearningRate 0.0001   Epoch: 19   Global Step: 241300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:35:23,402-Speed 3094.40 samples/sec   Loss 0.8742   LearningRate 0.0001   Epoch: 19   Global Step: 241310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:35:26,791-Speed 3023.03 samples/sec   Loss 0.8405   LearningRate 0.0001   Epoch: 19   Global Step: 241320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:35:30,131-Speed 3066.09 samples/sec   Loss 0.8467   LearningRate 0.0001   Epoch: 19   Global Step: 241330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:35:33,464-Speed 3073.34 samples/sec   Loss 0.9196   LearningRate 0.0001   Epoch: 19   Global Step: 241340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:35:36,799-Speed 3071.60 samples/sec   Loss 0.8617   LearningRate 0.0001   Epoch: 19   Global Step: 241350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:35:40,139-Speed 3066.44 samples/sec   Loss 0.8579   LearningRate 0.0001   Epoch: 19   Global Step: 241360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:35:43,493-Speed 3054.26 samples/sec   Loss 0.8287   LearningRate 0.0001   Epoch: 19   Global Step: 241370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:35:46,832-Speed 3067.75 samples/sec   Loss 0.8882   LearningRate 0.0001   Epoch: 19   Global Step: 241380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:35:50,232-Speed 3012.21 samples/sec   Loss 0.8578   LearningRate 0.0001   Epoch: 19   Global Step: 241390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:35:53,595-Speed 3046.15 samples/sec   Loss 0.8994   LearningRate 0.0001   Epoch: 19   Global Step: 241400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:35:56,987-Speed 3019.76 samples/sec   Loss 0.8688   LearningRate 0.0001   Epoch: 19   Global Step: 241410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:36:00,495-Speed 2919.88 samples/sec   Loss 0.8585   LearningRate 0.0001   Epoch: 19   Global Step: 241420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:36:04,002-Speed 2920.52 samples/sec   Loss 0.8642   LearningRate 0.0001   Epoch: 19   Global Step: 241430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:36:07,389-Speed 3024.31 samples/sec   Loss 0.8448   LearningRate 0.0001   Epoch: 19   Global Step: 241440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:36:10,751-Speed 3046.27 samples/sec   Loss 0.8801   LearningRate 0.0001   Epoch: 19   Global Step: 241450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:36:14,233-Speed 2942.44 samples/sec   Loss 0.9043   LearningRate 0.0001   Epoch: 19   Global Step: 241460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:36:17,667-Speed 2982.63 samples/sec   Loss 0.8749   LearningRate 0.0001   Epoch: 19   Global Step: 241470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:36:21,101-Speed 2982.43 samples/sec   Loss 0.8571   LearningRate 0.0001   Epoch: 19   Global Step: 241480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:36:24,485-Speed 3026.74 samples/sec   Loss 0.9006   LearningRate 0.0001   Epoch: 19   Global Step: 241490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:36:27,834-Speed 3058.33 samples/sec   Loss 0.8477   LearningRate 0.0001   Epoch: 19   Global Step: 241500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:36:31,279-Speed 2973.29 samples/sec   Loss 0.8563   LearningRate 0.0001   Epoch: 19   Global Step: 241510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:36:34,645-Speed 3043.37 samples/sec   Loss 0.8538   LearningRate 0.0001   Epoch: 19   Global Step: 241520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:36:37,999-Speed 3053.65 samples/sec   Loss 0.9275   LearningRate 0.0001   Epoch: 19   Global Step: 241530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:36:41,363-Speed 3044.89 samples/sec   Loss 0.8734   LearningRate 0.0001   Epoch: 19   Global Step: 241540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:36:44,692-Speed 3077.23 samples/sec   Loss 0.8757   LearningRate 0.0001   Epoch: 19   Global Step: 241550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:36:48,068-Speed 3033.88 samples/sec   Loss 0.8583   LearningRate 0.0001   Epoch: 19   Global Step: 241560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:36:51,466-Speed 3014.74 samples/sec   Loss 0.8593   LearningRate 0.0001   Epoch: 19   Global Step: 241570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:36:54,813-Speed 3059.74 samples/sec   Loss 0.8784   LearningRate 0.0001   Epoch: 19   Global Step: 241580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:36:58,169-Speed 3052.36 samples/sec   Loss 0.8577   LearningRate 0.0001   Epoch: 19   Global Step: 241590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:37:01,542-Speed 3036.86 samples/sec   Loss 0.8196   LearningRate 0.0001   Epoch: 19   Global Step: 241600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:04,992-Speed 2968.83 samples/sec   Loss 0.8830   LearningRate 0.0001   Epoch: 19   Global Step: 241610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:08,338-Speed 3061.14 samples/sec   Loss 0.8609   LearningRate 0.0001   Epoch: 19   Global Step: 241620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:11,734-Speed 3016.88 samples/sec   Loss 0.8693   LearningRate 0.0001   Epoch: 19   Global Step: 241630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:15,227-Speed 2931.95 samples/sec   Loss 0.8715   LearningRate 0.0001   Epoch: 19   Global Step: 241640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:18,649-Speed 2993.08 samples/sec   Loss 0.8793   LearningRate 0.0001   Epoch: 19   Global Step: 241650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:22,089-Speed 2977.90 samples/sec   Loss 0.8923   LearningRate 0.0001   Epoch: 19   Global Step: 241660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:25,453-Speed 3044.37 samples/sec   Loss 0.8995   LearningRate 0.0001   Epoch: 19   Global Step: 241670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:28,824-Speed 3039.26 samples/sec   Loss 0.8792   LearningRate 0.0001   Epoch: 19   Global Step: 241680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:32,195-Speed 3037.87 samples/sec   Loss 0.8727   LearningRate 0.0001   Epoch: 19   Global Step: 241690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:35,540-Speed 3062.44 samples/sec   Loss 0.8603   LearningRate 0.0001   Epoch: 19   Global Step: 241700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:37:38,873-Speed 3073.30 samples/sec   Loss 0.8660   LearningRate 0.0001   Epoch: 19   Global Step: 241710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:37:42,187-Speed 3090.54 samples/sec   Loss 0.8581   LearningRate 0.0001   Epoch: 19   Global Step: 241720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:45,581-Speed 3017.58 samples/sec   Loss 0.8852   LearningRate 0.0001   Epoch: 19   Global Step: 241730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:48,946-Speed 3044.07 samples/sec   Loss 0.8802   LearningRate 0.0001   Epoch: 19   Global Step: 241740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:52,323-Speed 3033.24 samples/sec   Loss 0.8477   LearningRate 0.0001   Epoch: 19   Global Step: 241750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:55,733-Speed 3003.45 samples/sec   Loss 0.8908   LearningRate 0.0001   Epoch: 19   Global Step: 241760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:37:59,080-Speed 3060.37 samples/sec   Loss 0.8543   LearningRate 0.0001   Epoch: 19   Global Step: 241770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:02,475-Speed 3017.08 samples/sec   Loss 0.8637   LearningRate 0.0001   Epoch: 19   Global Step: 241780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:05,827-Speed 3055.46 samples/sec   Loss 0.8910   LearningRate 0.0001   Epoch: 19   Global Step: 241790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:09,163-Speed 3071.02 samples/sec   Loss 0.8902   LearningRate 0.0001   Epoch: 19   Global Step: 241800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:12,493-Speed 3076.01 samples/sec   Loss 0.9249   LearningRate 0.0001   Epoch: 19   Global Step: 241810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:15,811-Speed 3086.79 samples/sec   Loss 0.8757   LearningRate 0.0001   Epoch: 19   Global Step: 241820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:38:19,131-Speed 3085.66 samples/sec   Loss 0.8811   LearningRate 0.0001   Epoch: 19   Global Step: 241830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:22,480-Speed 3058.43 samples/sec   Loss 0.8493   LearningRate 0.0001   Epoch: 19   Global Step: 241840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:25,805-Speed 3080.86 samples/sec   Loss 0.8477   LearningRate 0.0001   Epoch: 19   Global Step: 241850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:29,214-Speed 3004.59 samples/sec   Loss 0.8793   LearningRate 0.0001   Epoch: 19   Global Step: 241860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:32,569-Speed 3053.26 samples/sec   Loss 0.9010   LearningRate 0.0001   Epoch: 19   Global Step: 241870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:35,912-Speed 3063.82 samples/sec   Loss 0.8422   LearningRate 0.0001   Epoch: 19   Global Step: 241880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:39,300-Speed 3023.01 samples/sec   Loss 0.8358   LearningRate 0.0001   Epoch: 19   Global Step: 241890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:42,698-Speed 3014.25 samples/sec   Loss 0.8632   LearningRate 0.0001   Epoch: 19   Global Step: 241900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:46,109-Speed 3003.61 samples/sec   Loss 0.8841   LearningRate 0.0001   Epoch: 19   Global Step: 241910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:49,512-Speed 3009.25 samples/sec   Loss 0.9276   LearningRate 0.0001   Epoch: 19   Global Step: 241920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:38:52,846-Speed 3072.64 samples/sec   Loss 0.9393   LearningRate 0.0001   Epoch: 19   Global Step: 241930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:38:56,218-Speed 3037.75 samples/sec   Loss 0.8910   LearningRate 0.0001   Epoch: 19   Global Step: 241940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:38:59,600-Speed 3028.20 samples/sec   Loss 0.8828   LearningRate 0.0001   Epoch: 19   Global Step: 241950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:39:02,974-Speed 3035.71 samples/sec   Loss 0.8740   LearningRate 0.0001   Epoch: 19   Global Step: 241960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:39:06,286-Speed 3092.52 samples/sec   Loss 0.8781   LearningRate 0.0001   Epoch: 19   Global Step: 241970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:39:09,658-Speed 3038.21 samples/sec   Loss 0.9045   LearningRate 0.0001   Epoch: 19   Global Step: 241980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:39:13,017-Speed 3048.87 samples/sec   Loss 0.8879   LearningRate 0.0001   Epoch: 19   Global Step: 241990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:39:16,368-Speed 3056.72 samples/sec   Loss 0.8884   LearningRate 0.0001   Epoch: 19   Global Step: 242000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:39:19,792-Speed 2991.77 samples/sec   Loss 0.9460   LearningRate 0.0001   Epoch: 19   Global Step: 242010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:39:23,180-Speed 3023.75 samples/sec   Loss 0.8331   LearningRate 0.0001   Epoch: 19   Global Step: 242020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:39:26,548-Speed 3040.60 samples/sec   Loss 0.8672   LearningRate 0.0001   Epoch: 19   Global Step: 242030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:39:29,901-Speed 3055.50 samples/sec   Loss 0.8715   LearningRate 0.0001   Epoch: 19   Global Step: 242040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:39:33,220-Speed 3085.36 samples/sec   Loss 0.8963   LearningRate 0.0001   Epoch: 19   Global Step: 242050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:39:36,558-Speed 3068.34 samples/sec   Loss 0.8761   LearningRate 0.0001   Epoch: 19   Global Step: 242060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:39:39,949-Speed 3020.51 samples/sec   Loss 0.8748   LearningRate 0.0001   Epoch: 19   Global Step: 242070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:39:43,290-Speed 3066.37 samples/sec   Loss 0.8803   LearningRate 0.0001   Epoch: 19   Global Step: 242080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:39:46,656-Speed 3042.27 samples/sec   Loss 0.9496   LearningRate 0.0001   Epoch: 19   Global Step: 242090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:39:50,031-Speed 3035.58 samples/sec   Loss 0.9045   LearningRate 0.0001   Epoch: 19   Global Step: 242100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:39:53,384-Speed 3054.86 samples/sec   Loss 0.8662   LearningRate 0.0001   Epoch: 19   Global Step: 242110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:39:56,732-Speed 3059.25 samples/sec   Loss 0.8596   LearningRate 0.0001   Epoch: 19   Global Step: 242120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:40:00,062-Speed 3076.07 samples/sec   Loss 0.8949   LearningRate 0.0001   Epoch: 19   Global Step: 242130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:40:03,421-Speed 3049.08 samples/sec   Loss 0.8569   LearningRate 0.0001   Epoch: 19   Global Step: 242140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:40:06,805-Speed 3027.19 samples/sec   Loss 0.8804   LearningRate 0.0001   Epoch: 19   Global Step: 242150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:40:10,180-Speed 3034.14 samples/sec   Loss 0.8670   LearningRate 0.0001   Epoch: 19   Global Step: 242160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:40:13,538-Speed 3050.68 samples/sec   Loss 0.8747   LearningRate 0.0001   Epoch: 19   Global Step: 242170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:40:16,911-Speed 3036.91 samples/sec   Loss 0.9096   LearningRate 0.0001   Epoch: 19   Global Step: 242180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:40:20,288-Speed 3032.60 samples/sec   Loss 0.8948   LearningRate 0.0001   Epoch: 19   Global Step: 242190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:40:23,619-Speed 3075.13 samples/sec   Loss 0.8632   LearningRate 0.0001   Epoch: 19   Global Step: 242200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:40:27,008-Speed 3022.42 samples/sec   Loss 0.8498   LearningRate 0.0001   Epoch: 19   Global Step: 242210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:40:30,402-Speed 3017.50 samples/sec   Loss 0.8660   LearningRate 0.0001   Epoch: 19   Global Step: 242220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:40:33,780-Speed 3032.67 samples/sec   Loss 0.8617   LearningRate 0.0001   Epoch: 19   Global Step: 242230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:40:37,087-Speed 3097.00 samples/sec   Loss 0.8506   LearningRate 0.0001   Epoch: 19   Global Step: 242240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:40:40,499-Speed 3001.87 samples/sec   Loss 0.8328   LearningRate 0.0001   Epoch: 19   Global Step: 242250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:40:43,864-Speed 3044.27 samples/sec   Loss 0.8524   LearningRate 0.0001   Epoch: 19   Global Step: 242260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:40:47,274-Speed 3003.59 samples/sec   Loss 0.8293   LearningRate 0.0001   Epoch: 19   Global Step: 242270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:40:50,672-Speed 3014.88 samples/sec   Loss 0.9024   LearningRate 0.0001   Epoch: 19   Global Step: 242280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:40:54,033-Speed 3047.90 samples/sec   Loss 0.8709   LearningRate 0.0001   Epoch: 19   Global Step: 242290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:40:57,487-Speed 2964.87 samples/sec   Loss 0.8402   LearningRate 0.0001   Epoch: 19   Global Step: 242300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:41:00,807-Speed 3086.25 samples/sec   Loss 0.8641   LearningRate 0.0001   Epoch: 19   Global Step: 242310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:41:04,181-Speed 3035.05 samples/sec   Loss 0.8395   LearningRate 0.0001   Epoch: 19   Global Step: 242320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:41:07,501-Speed 3085.27 samples/sec   Loss 0.8708   LearningRate 0.0001   Epoch: 19   Global Step: 242330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:41:10,866-Speed 3044.17 samples/sec   Loss 0.8913   LearningRate 0.0001   Epoch: 19   Global Step: 242340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:41:14,215-Speed 3058.09 samples/sec   Loss 0.9375   LearningRate 0.0001   Epoch: 19   Global Step: 242350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:41:17,610-Speed 3017.07 samples/sec   Loss 0.8718   LearningRate 0.0001   Epoch: 19   Global Step: 242360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:41:20,995-Speed 3026.20 samples/sec   Loss 0.8716   LearningRate 0.0001   Epoch: 19   Global Step: 242370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:41:24,350-Speed 3052.13 samples/sec   Loss 0.8265   LearningRate 0.0001   Epoch: 19   Global Step: 242380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:41:27,772-Speed 2993.81 samples/sec   Loss 0.8745   LearningRate 0.0001   Epoch: 19   Global Step: 242390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:41:31,105-Speed 3072.81 samples/sec   Loss 0.8579   LearningRate 0.0001   Epoch: 19   Global Step: 242400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:41:34,431-Speed 3079.70 samples/sec   Loss 0.8809   LearningRate 0.0001   Epoch: 19   Global Step: 242410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:41:37,778-Speed 3059.69 samples/sec   Loss 0.8418   LearningRate 0.0001   Epoch: 19   Global Step: 242420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:41:41,171-Speed 3019.78 samples/sec   Loss 0.9046   LearningRate 0.0001   Epoch: 19   Global Step: 242430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:41:44,586-Speed 2998.99 samples/sec   Loss 0.8381   LearningRate 0.0001   Epoch: 19   Global Step: 242440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:41:48,040-Speed 2965.60 samples/sec   Loss 0.8577   LearningRate 0.0001   Epoch: 19   Global Step: 242450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:41:51,539-Speed 2927.14 samples/sec   Loss 0.8425   LearningRate 0.0001   Epoch: 19   Global Step: 242460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:41:54,898-Speed 3049.77 samples/sec   Loss 0.8966   LearningRate 0.0001   Epoch: 19   Global Step: 242470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:41:58,254-Speed 3052.44 samples/sec   Loss 0.9006   LearningRate 0.0001   Epoch: 19   Global Step: 242480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:42:01,647-Speed 3018.58 samples/sec   Loss 0.8727   LearningRate 0.0001   Epoch: 19   Global Step: 242490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:42:04,999-Speed 3055.45 samples/sec   Loss 0.8570   LearningRate 0.0001   Epoch: 19   Global Step: 242500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:42:08,394-Speed 3017.04 samples/sec   Loss 0.8702   LearningRate 0.0001   Epoch: 19   Global Step: 242510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:42:11,844-Speed 2968.54 samples/sec   Loss 0.8578   LearningRate 0.0001   Epoch: 19   Global Step: 242520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:42:15,307-Speed 2958.33 samples/sec   Loss 0.8274   LearningRate 0.0001   Epoch: 19   Global Step: 242530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:42:18,639-Speed 3073.78 samples/sec   Loss 0.8678   LearningRate 0.0001   Epoch: 19   Global Step: 242540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:42:22,101-Speed 2958.97 samples/sec   Loss 0.8624   LearningRate 0.0001   Epoch: 19   Global Step: 242550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:42:25,464-Speed 3045.51 samples/sec   Loss 0.8401   LearningRate 0.0001   Epoch: 19   Global Step: 242560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:42:28,828-Speed 3045.25 samples/sec   Loss 0.8532   LearningRate 0.0001   Epoch: 19   Global Step: 242570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:42:32,237-Speed 3004.03 samples/sec   Loss 0.8408   LearningRate 0.0001   Epoch: 19   Global Step: 242580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:42:35,648-Speed 3003.40 samples/sec   Loss 0.8574   LearningRate 0.0001   Epoch: 19   Global Step: 242590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:42:39,071-Speed 2991.82 samples/sec   Loss 0.8391   LearningRate 0.0001   Epoch: 19   Global Step: 242600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:42:42,433-Speed 3047.10 samples/sec   Loss 0.8794   LearningRate 0.0001   Epoch: 19   Global Step: 242610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:42:45,806-Speed 3036.64 samples/sec   Loss 0.8548   LearningRate 0.0001   Epoch: 19   Global Step: 242620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:42:49,184-Speed 3032.58 samples/sec   Loss 0.8315   LearningRate 0.0001   Epoch: 19   Global Step: 242630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:42:52,678-Speed 2931.55 samples/sec   Loss 0.8869   LearningRate 0.0001   Epoch: 19   Global Step: 242640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:42:56,182-Speed 2923.18 samples/sec   Loss 0.8556   LearningRate 0.0001   Epoch: 19   Global Step: 242650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:42:59,657-Speed 2947.29 samples/sec   Loss 0.9003   LearningRate 0.0001   Epoch: 19   Global Step: 242660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:43:03,050-Speed 3019.12 samples/sec   Loss 0.8584   LearningRate 0.0001   Epoch: 19   Global Step: 242670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:43:06,453-Speed 3009.42 samples/sec   Loss 0.8733   LearningRate 0.0001   Epoch: 19   Global Step: 242680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:43:09,819-Speed 3042.78 samples/sec   Loss 0.8692   LearningRate 0.0001   Epoch: 19   Global Step: 242690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:43:13,200-Speed 3030.11 samples/sec   Loss 0.8464   LearningRate 0.0001   Epoch: 19   Global Step: 242700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:43:16,649-Speed 2969.85 samples/sec   Loss 0.8600   LearningRate 0.0001   Epoch: 19   Global Step: 242710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-28 00:43:20,009-Speed 3048.41 samples/sec   Loss 0.8894   LearningRate 0.0001   Epoch: 19   Global Step: 242720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:43:23,348-Speed 3067.24 samples/sec   Loss 0.8784   LearningRate 0.0001   Epoch: 19   Global Step: 242730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:43:26,713-Speed 3044.28 samples/sec   Loss 0.8683   LearningRate 0.0001   Epoch: 19   Global Step: 242740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:43:30,047-Speed 3072.46 samples/sec   Loss 0.8665   LearningRate 0.0001   Epoch: 19   Global Step: 242750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:43:33,392-Speed 3062.58 samples/sec   Loss 0.8457   LearningRate 0.0001   Epoch: 19   Global Step: 242760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:43:36,702-Speed 3093.76 samples/sec   Loss 0.8131   LearningRate 0.0001   Epoch: 19   Global Step: 242770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:43:40,109-Speed 3006.20 samples/sec   Loss 0.8854   LearningRate 0.0001   Epoch: 19   Global Step: 242780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:43:43,509-Speed 3012.73 samples/sec   Loss 0.8743   LearningRate 0.0001   Epoch: 19   Global Step: 242790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:43:46,923-Speed 3000.63 samples/sec   Loss 0.8836   LearningRate 0.0001   Epoch: 19   Global Step: 242800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:43:50,263-Speed 3066.43 samples/sec   Loss 0.8903   LearningRate 0.0001   Epoch: 19   Global Step: 242810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:43:53,693-Speed 2986.57 samples/sec   Loss 0.8724   LearningRate 0.0001   Epoch: 19   Global Step: 242820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:43:57,124-Speed 2985.50 samples/sec   Loss 0.8931   LearningRate 0.0001   Epoch: 19   Global Step: 242830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:44:00,532-Speed 3005.66 samples/sec   Loss 0.8565   LearningRate 0.0001   Epoch: 19   Global Step: 242840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:44:04,011-Speed 2944.25 samples/sec   Loss 0.8538   LearningRate 0.0001   Epoch: 19   Global Step: 242850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:44:07,448-Speed 2979.39 samples/sec   Loss 0.8116   LearningRate 0.0001   Epoch: 19   Global Step: 242860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:44:10,825-Speed 3033.95 samples/sec   Loss 0.8428   LearningRate 0.0000   Epoch: 19   Global Step: 242870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:44:14,245-Speed 2994.68 samples/sec   Loss 0.8593   LearningRate 0.0000   Epoch: 19   Global Step: 242880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:44:17,588-Speed 3063.75 samples/sec   Loss 0.8338   LearningRate 0.0000   Epoch: 19   Global Step: 242890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:44:20,959-Speed 3037.92 samples/sec   Loss 0.8723   LearningRate 0.0000   Epoch: 19   Global Step: 242900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:44:24,307-Speed 3059.57 samples/sec   Loss 0.8566   LearningRate 0.0000   Epoch: 19   Global Step: 242910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:44:27,709-Speed 3011.25 samples/sec   Loss 0.8471   LearningRate 0.0000   Epoch: 19   Global Step: 242920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:44:31,170-Speed 2959.55 samples/sec   Loss 0.8549   LearningRate 0.0000   Epoch: 19   Global Step: 242930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:44:34,566-Speed 3015.52 samples/sec   Loss 0.8921   LearningRate 0.0000   Epoch: 19   Global Step: 242940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:44:37,898-Speed 3074.94 samples/sec   Loss 0.8293   LearningRate 0.0000   Epoch: 19   Global Step: 242950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:44:41,353-Speed 2964.25 samples/sec   Loss 0.8184   LearningRate 0.0000   Epoch: 19   Global Step: 242960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:44:44,679-Speed 3079.76 samples/sec   Loss 0.9256   LearningRate 0.0000   Epoch: 19   Global Step: 242970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:44:48,024-Speed 3062.55 samples/sec   Loss 0.9010   LearningRate 0.0000   Epoch: 19   Global Step: 242980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:44:51,341-Speed 3087.48 samples/sec   Loss 0.8911   LearningRate 0.0000   Epoch: 19   Global Step: 242990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:44:54,739-Speed 3014.42 samples/sec   Loss 0.9175   LearningRate 0.0000   Epoch: 19   Global Step: 243000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:44:58,107-Speed 3041.60 samples/sec   Loss 0.8346   LearningRate 0.0000   Epoch: 19   Global Step: 243010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:45:01,463-Speed 3052.02 samples/sec   Loss 0.8629   LearningRate 0.0000   Epoch: 19   Global Step: 243020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:45:04,779-Speed 3088.88 samples/sec   Loss 0.8631   LearningRate 0.0000   Epoch: 19   Global Step: 243030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:45:08,219-Speed 2977.22 samples/sec   Loss 0.8536   LearningRate 0.0000   Epoch: 19   Global Step: 243040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:45:11,604-Speed 3026.57 samples/sec   Loss 0.8382   LearningRate 0.0000   Epoch: 19   Global Step: 243050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:45:15,056-Speed 2967.92 samples/sec   Loss 0.8780   LearningRate 0.0000   Epoch: 19   Global Step: 243060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:45:18,455-Speed 3012.96 samples/sec   Loss 0.8636   LearningRate 0.0000   Epoch: 19   Global Step: 243070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-28 00:45:21,882-Speed 2988.87 samples/sec   Loss 0.9214   LearningRate 0.0000   Epoch: 19   Global Step: 243080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:45:25,351-Speed 2952.69 samples/sec   Loss 0.8590   LearningRate 0.0000   Epoch: 19   Global Step: 243090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:45:28,841-Speed 2935.09 samples/sec   Loss 0.8335   LearningRate 0.0000   Epoch: 19   Global Step: 243100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:45:32,190-Speed 3058.16 samples/sec   Loss 0.8701   LearningRate 0.0000   Epoch: 19   Global Step: 243110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:45:35,525-Speed 3071.49 samples/sec   Loss 0.8776   LearningRate 0.0000   Epoch: 19   Global Step: 243120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:45:38,921-Speed 3016.38 samples/sec   Loss 0.8748   LearningRate 0.0000   Epoch: 19   Global Step: 243130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-28 00:45:42,283-Speed 3046.60 samples/sec   Loss 0.8904   LearningRate 0.0000   Epoch: 19   Global Step: 243140   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:45:45,651-Speed 3040.96 samples/sec   Loss 0.8754   LearningRate 0.0000   Epoch: 19   Global Step: 243150   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:45:49,088-Speed 2980.86 samples/sec   Loss 0.8657   LearningRate 0.0000   Epoch: 19   Global Step: 243160   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:45:52,471-Speed 3027.12 samples/sec   Loss 0.8676   LearningRate 0.0000   Epoch: 19   Global Step: 243170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:45:55,780-Speed 3095.74 samples/sec   Loss 0.8211   LearningRate 0.0000   Epoch: 19   Global Step: 243180   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:45:59,117-Speed 3068.97 samples/sec   Loss 0.8705   LearningRate 0.0000   Epoch: 19   Global Step: 243190   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:46:02,494-Speed 3033.58 samples/sec   Loss 0.8653   LearningRate 0.0000   Epoch: 19   Global Step: 243200   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:46:05,857-Speed 3045.73 samples/sec   Loss 0.8725   LearningRate 0.0000   Epoch: 19   Global Step: 243210   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:46:09,280-Speed 2992.32 samples/sec   Loss 0.9112   LearningRate 0.0000   Epoch: 19   Global Step: 243220   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:46:12,686-Speed 3007.33 samples/sec   Loss 0.8847   LearningRate 0.0000   Epoch: 19   Global Step: 243230   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:46:16,099-Speed 3000.88 samples/sec   Loss 0.8405   LearningRate 0.0000   Epoch: 19   Global Step: 243240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:46:19,514-Speed 2999.64 samples/sec   Loss 0.8912   LearningRate 0.0000   Epoch: 19   Global Step: 243250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:46:22,908-Speed 3017.41 samples/sec   Loss 0.8759   LearningRate 0.0000   Epoch: 19   Global Step: 243260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:46:26,218-Speed 3094.82 samples/sec   Loss 0.8360   LearningRate 0.0000   Epoch: 19   Global Step: 243270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:46:29,539-Speed 3084.75 samples/sec   Loss 0.9195   LearningRate 0.0000   Epoch: 19   Global Step: 243280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:46:32,932-Speed 3018.42 samples/sec   Loss 0.8795   LearningRate 0.0000   Epoch: 19   Global Step: 243290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:46:36,413-Speed 2942.65 samples/sec   Loss 0.8578   LearningRate 0.0000   Epoch: 19   Global Step: 243300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:46:39,729-Speed 3088.39 samples/sec   Loss 0.8998   LearningRate 0.0000   Epoch: 19   Global Step: 243310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:46:43,113-Speed 3026.71 samples/sec   Loss 0.9105   LearningRate 0.0000   Epoch: 19   Global Step: 243320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:46:46,508-Speed 3017.05 samples/sec   Loss 0.8897   LearningRate 0.0000   Epoch: 19   Global Step: 243330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:46:49,867-Speed 3049.47 samples/sec   Loss 0.8638   LearningRate 0.0000   Epoch: 19   Global Step: 243340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:46:53,221-Speed 3053.87 samples/sec   Loss 0.8770   LearningRate 0.0000   Epoch: 19   Global Step: 243350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:46:56,706-Speed 2938.84 samples/sec   Loss 0.8687   LearningRate 0.0000   Epoch: 19   Global Step: 243360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:47:00,074-Speed 3041.44 samples/sec   Loss 0.8856   LearningRate 0.0000   Epoch: 19   Global Step: 243370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:47:03,527-Speed 2966.55 samples/sec   Loss 0.8802   LearningRate 0.0000   Epoch: 19   Global Step: 243380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:47:06,907-Speed 3030.32 samples/sec   Loss 0.8676   LearningRate 0.0000   Epoch: 19   Global Step: 243390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:47:10,240-Speed 3073.11 samples/sec   Loss 0.8814   LearningRate 0.0000   Epoch: 19   Global Step: 243400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:47:13,582-Speed 3065.50 samples/sec   Loss 0.8717   LearningRate 0.0000   Epoch: 19   Global Step: 243410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:47:17,078-Speed 2930.08 samples/sec   Loss 0.8758   LearningRate 0.0000   Epoch: 19   Global Step: 243420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:47:20,499-Speed 2994.17 samples/sec   Loss 0.8718   LearningRate 0.0000   Epoch: 19   Global Step: 243430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:47:23,961-Speed 2958.31 samples/sec   Loss 0.8787   LearningRate 0.0000   Epoch: 19   Global Step: 243440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 00:47:27,383-Speed 2993.29 samples/sec   Loss 0.8678   LearningRate 0.0000   Epoch: 19   Global Step: 243450   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:47:30,778-Speed 3017.39 samples/sec   Loss 0.8703   LearningRate 0.0000   Epoch: 19   Global Step: 243460   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:47:34,241-Speed 2957.73 samples/sec   Loss 0.8762   LearningRate 0.0000   Epoch: 19   Global Step: 243470   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:47:37,728-Speed 2937.97 samples/sec   Loss 0.8919   LearningRate 0.0000   Epoch: 19   Global Step: 243480   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:47:41,171-Speed 2975.27 samples/sec   Loss 0.8566   LearningRate 0.0000   Epoch: 19   Global Step: 243490   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:47:44,559-Speed 3023.11 samples/sec   Loss 0.8544   LearningRate 0.0000   Epoch: 19   Global Step: 243500   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:47:47,975-Speed 2998.68 samples/sec   Loss 0.8354   LearningRate 0.0000   Epoch: 19   Global Step: 243510   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:47:51,372-Speed 3015.72 samples/sec   Loss 0.9141   LearningRate 0.0000   Epoch: 19   Global Step: 243520   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:47:54,769-Speed 3014.86 samples/sec   Loss 0.8732   LearningRate 0.0000   Epoch: 19   Global Step: 243530   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:47:58,203-Speed 2982.64 samples/sec   Loss 0.8713   LearningRate 0.0000   Epoch: 19   Global Step: 243540   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:48:01,614-Speed 3003.24 samples/sec   Loss 0.8328   LearningRate 0.0000   Epoch: 19   Global Step: 243550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:05,012-Speed 3014.07 samples/sec   Loss 0.8649   LearningRate 0.0000   Epoch: 19   Global Step: 243560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:08,379-Speed 3042.05 samples/sec   Loss 0.8863   LearningRate 0.0000   Epoch: 19   Global Step: 243570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:11,797-Speed 2996.96 samples/sec   Loss 0.8704   LearningRate 0.0000   Epoch: 19   Global Step: 243580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:15,226-Speed 2987.20 samples/sec   Loss 0.8376   LearningRate 0.0000   Epoch: 19   Global Step: 243590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:18,736-Speed 2917.67 samples/sec   Loss 0.8832   LearningRate 0.0000   Epoch: 19   Global Step: 243600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:22,110-Speed 3035.83 samples/sec   Loss 0.8502   LearningRate 0.0000   Epoch: 19   Global Step: 243610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:25,510-Speed 3012.97 samples/sec   Loss 0.8491   LearningRate 0.0000   Epoch: 19   Global Step: 243620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:28,971-Speed 2958.97 samples/sec   Loss 0.8764   LearningRate 0.0000   Epoch: 19   Global Step: 243630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:32,437-Speed 2955.48 samples/sec   Loss 0.8704   LearningRate 0.0000   Epoch: 19   Global Step: 243640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:35,817-Speed 3030.21 samples/sec   Loss 0.8583   LearningRate 0.0000   Epoch: 19   Global Step: 243650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 00:48:39,178-Speed 3047.40 samples/sec   Loss 0.8562   LearningRate 0.0000   Epoch: 19   Global Step: 243660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:42,666-Speed 2936.94 samples/sec   Loss 0.9058   LearningRate 0.0000   Epoch: 19   Global Step: 243670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:46,081-Speed 2998.90 samples/sec   Loss 0.8792   LearningRate 0.0000   Epoch: 19   Global Step: 243680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:49,537-Speed 2963.70 samples/sec   Loss 0.8443   LearningRate 0.0000   Epoch: 19   Global Step: 243690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:52,999-Speed 2958.42 samples/sec   Loss 0.8952   LearningRate 0.0000   Epoch: 19   Global Step: 243700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:56,377-Speed 3032.24 samples/sec   Loss 0.8916   LearningRate 0.0000   Epoch: 19   Global Step: 243710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:48:59,768-Speed 3020.75 samples/sec   Loss 0.8903   LearningRate 0.0000   Epoch: 19   Global Step: 243720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:49:03,207-Speed 2978.49 samples/sec   Loss 0.8894   LearningRate 0.0000   Epoch: 19   Global Step: 243730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:49:06,591-Speed 3026.62 samples/sec   Loss 0.8906   LearningRate 0.0000   Epoch: 19   Global Step: 243740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:49:09,931-Speed 3066.91 samples/sec   Loss 0.8970   LearningRate 0.0000   Epoch: 19   Global Step: 243750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:49:13,351-Speed 2995.26 samples/sec   Loss 0.8854   LearningRate 0.0000   Epoch: 19   Global Step: 243760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 00:49:16,699-Speed 3059.13 samples/sec   Loss 0.8912   LearningRate 0.0000   Epoch: 19   Global Step: 243770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:49:20,059-Speed 3049.02 samples/sec   Loss 0.8723   LearningRate 0.0000   Epoch: 19   Global Step: 243780   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:49:23,505-Speed 2972.11 samples/sec   Loss 0.8840   LearningRate 0.0000   Epoch: 19   Global Step: 243790   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:49:26,919-Speed 3000.23 samples/sec   Loss 0.8539   LearningRate 0.0000   Epoch: 19   Global Step: 243800   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:49:30,352-Speed 2984.04 samples/sec   Loss 0.8411   LearningRate 0.0000   Epoch: 19   Global Step: 243810   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:49:33,702-Speed 3057.09 samples/sec   Loss 0.8419   LearningRate 0.0000   Epoch: 19   Global Step: 243820   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:49:37,120-Speed 2997.31 samples/sec   Loss 0.8341   LearningRate 0.0000   Epoch: 19   Global Step: 243830   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:49:40,543-Speed 2991.86 samples/sec   Loss 0.8313   LearningRate 0.0000   Epoch: 19   Global Step: 243840   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:49:43,891-Speed 3059.52 samples/sec   Loss 0.8603   LearningRate 0.0000   Epoch: 19   Global Step: 243850   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:49:47,279-Speed 3023.25 samples/sec   Loss 0.8766   LearningRate 0.0000   Epoch: 19   Global Step: 243860   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:49:50,777-Speed 2928.71 samples/sec   Loss 0.8503   LearningRate 0.0000   Epoch: 19   Global Step: 243870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:49:54,186-Speed 3004.31 samples/sec   Loss 0.8692   LearningRate 0.0000   Epoch: 19   Global Step: 243880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:49:57,636-Speed 2969.08 samples/sec   Loss 0.8819   LearningRate 0.0000   Epoch: 19   Global Step: 243890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:01,101-Speed 2956.64 samples/sec   Loss 0.8678   LearningRate 0.0000   Epoch: 19   Global Step: 243900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:04,504-Speed 3009.05 samples/sec   Loss 0.8524   LearningRate 0.0000   Epoch: 19   Global Step: 243910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:07,925-Speed 2994.88 samples/sec   Loss 0.8826   LearningRate 0.0000   Epoch: 19   Global Step: 243920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:11,337-Speed 3001.89 samples/sec   Loss 0.8386   LearningRate 0.0000   Epoch: 19   Global Step: 243930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:14,839-Speed 2924.74 samples/sec   Loss 0.8576   LearningRate 0.0000   Epoch: 19   Global Step: 243940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:18,192-Speed 3054.30 samples/sec   Loss 0.9177   LearningRate 0.0000   Epoch: 19   Global Step: 243950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:21,608-Speed 2998.91 samples/sec   Loss 0.8546   LearningRate 0.0000   Epoch: 19   Global Step: 243960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:24,997-Speed 3022.74 samples/sec   Loss 0.8513   LearningRate 0.0000   Epoch: 19   Global Step: 243970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:28,411-Speed 3000.29 samples/sec   Loss 0.8961   LearningRate 0.0000   Epoch: 19   Global Step: 243980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:31,911-Speed 2926.65 samples/sec   Loss 0.8561   LearningRate 0.0000   Epoch: 19   Global Step: 243990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:35,376-Speed 2956.09 samples/sec   Loss 0.8588   LearningRate 0.0000   Epoch: 19   Global Step: 244000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:38,714-Speed 3068.17 samples/sec   Loss 0.8484   LearningRate 0.0000   Epoch: 19   Global Step: 244010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:42,130-Speed 2998.33 samples/sec   Loss 0.8466   LearningRate 0.0000   Epoch: 19   Global Step: 244020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:45,502-Speed 3037.53 samples/sec   Loss 0.8952   LearningRate 0.0000   Epoch: 19   Global Step: 244030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:48,994-Speed 2933.30 samples/sec   Loss 0.8733   LearningRate 0.0000   Epoch: 19   Global Step: 244040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:52,340-Speed 3061.69 samples/sec   Loss 0.8732   LearningRate 0.0000   Epoch: 19   Global Step: 244050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:55,765-Speed 2990.38 samples/sec   Loss 0.8460   LearningRate 0.0000   Epoch: 19   Global Step: 244060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:50:59,256-Speed 2933.71 samples/sec   Loss 0.8841   LearningRate 0.0000   Epoch: 19   Global Step: 244070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:02,665-Speed 3004.78 samples/sec   Loss 0.8514   LearningRate 0.0000   Epoch: 19   Global Step: 244080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 00:51:06,135-Speed 2951.83 samples/sec   Loss 0.8822   LearningRate 0.0000   Epoch: 19   Global Step: 244090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 00:51:09,528-Speed 3018.85 samples/sec   Loss 0.8534   LearningRate 0.0000   Epoch: 19   Global Step: 244100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:12,843-Speed 3089.76 samples/sec   Loss 0.8779   LearningRate 0.0000   Epoch: 19   Global Step: 244110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:16,268-Speed 2990.76 samples/sec   Loss 0.8333   LearningRate 0.0000   Epoch: 19   Global Step: 244120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:19,630-Speed 3046.59 samples/sec   Loss 0.8610   LearningRate 0.0000   Epoch: 19   Global Step: 244130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:23,035-Speed 3008.75 samples/sec   Loss 0.8777   LearningRate 0.0000   Epoch: 19   Global Step: 244140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:26,368-Speed 3073.18 samples/sec   Loss 0.8218   LearningRate 0.0000   Epoch: 19   Global Step: 244150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:29,701-Speed 3073.39 samples/sec   Loss 0.8763   LearningRate 0.0000   Epoch: 19   Global Step: 244160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:33,120-Speed 2995.55 samples/sec   Loss 0.8661   LearningRate 0.0000   Epoch: 19   Global Step: 244170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:36,466-Speed 3061.25 samples/sec   Loss 0.8652   LearningRate 0.0000   Epoch: 19   Global Step: 244180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:39,825-Speed 3048.74 samples/sec   Loss 0.8850   LearningRate 0.0000   Epoch: 19   Global Step: 244190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:43,212-Speed 3024.88 samples/sec   Loss 0.8420   LearningRate 0.0000   Epoch: 19   Global Step: 244200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:46,640-Speed 2987.47 samples/sec   Loss 0.8739   LearningRate 0.0000   Epoch: 19   Global Step: 244210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:50,123-Speed 2940.88 samples/sec   Loss 0.8916   LearningRate 0.0000   Epoch: 19   Global Step: 244220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:53,568-Speed 2972.87 samples/sec   Loss 0.8983   LearningRate 0.0000   Epoch: 19   Global Step: 244230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:51:57,008-Speed 2977.80 samples/sec   Loss 0.8725   LearningRate 0.0000   Epoch: 19   Global Step: 244240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:52:00,399-Speed 3020.59 samples/sec   Loss 0.8616   LearningRate 0.0000   Epoch: 19   Global Step: 244250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:52:03,858-Speed 2961.06 samples/sec   Loss 0.8599   LearningRate 0.0000   Epoch: 19   Global Step: 244260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:52:07,369-Speed 2917.69 samples/sec   Loss 0.8803   LearningRate 0.0000   Epoch: 19   Global Step: 244270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:52:10,818-Speed 2969.51 samples/sec   Loss 0.8508   LearningRate 0.0000   Epoch: 19   Global Step: 244280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:52:14,157-Speed 3067.99 samples/sec   Loss 0.8677   LearningRate 0.0000   Epoch: 19   Global Step: 244290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:52:17,520-Speed 3045.81 samples/sec   Loss 0.8750   LearningRate 0.0000   Epoch: 19   Global Step: 244300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 00:52:20,940-Speed 2994.86 samples/sec   Loss 0.8479   LearningRate 0.0000   Epoch: 19   Global Step: 244310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 00:52:24,317-Speed 3033.53 samples/sec   Loss 0.8437   LearningRate 0.0000   Epoch: 19   Global Step: 244320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 00:52:27,877-Speed 2877.17 samples/sec   Loss 0.8939   LearningRate 0.0000   Epoch: 19   Global Step: 244330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 00:52:31,256-Speed 3031.49 samples/sec   Loss 0.8192   LearningRate 0.0000   Epoch: 19   Global Step: 244340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:52:34,604-Speed 3059.12 samples/sec   Loss 0.8430   LearningRate 0.0000   Epoch: 19   Global Step: 244350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:52:38,029-Speed 2990.36 samples/sec   Loss 0.8940   LearningRate 0.0000   Epoch: 19   Global Step: 244360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:52:41,387-Speed 3050.06 samples/sec   Loss 0.8635   LearningRate 0.0000   Epoch: 19   Global Step: 244370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:52:44,801-Speed 3000.75 samples/sec   Loss 0.8944   LearningRate 0.0000   Epoch: 19   Global Step: 244380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:52:48,153-Speed 3054.99 samples/sec   Loss 0.8810   LearningRate 0.0000   Epoch: 19   Global Step: 244390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:52:51,504-Speed 3057.59 samples/sec   Loss 0.8781   LearningRate 0.0000   Epoch: 19   Global Step: 244400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:52:54,883-Speed 3031.19 samples/sec   Loss 0.8535   LearningRate 0.0000   Epoch: 19   Global Step: 244410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:52:58,254-Speed 3038.11 samples/sec   Loss 0.8809   LearningRate 0.0000   Epoch: 19   Global Step: 244420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:53:01,654-Speed 3012.77 samples/sec   Loss 0.8524   LearningRate 0.0000   Epoch: 19   Global Step: 244430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:53:05,010-Speed 3051.53 samples/sec   Loss 0.8422   LearningRate 0.0000   Epoch: 19   Global Step: 244440   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:53:08,380-Speed 3039.92 samples/sec   Loss 0.8365   LearningRate 0.0000   Epoch: 19   Global Step: 244450   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:53:11,846-Speed 2955.40 samples/sec   Loss 0.8864   LearningRate 0.0000   Epoch: 19   Global Step: 244460   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:53:15,297-Speed 2967.68 samples/sec   Loss 0.8635   LearningRate 0.0000   Epoch: 19   Global Step: 244470   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:53:18,688-Speed 3021.34 samples/sec   Loss 0.9106   LearningRate 0.0000   Epoch: 19   Global Step: 244480   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:53:22,084-Speed 3018.42 samples/sec   Loss 0.8522   LearningRate 0.0000   Epoch: 19   Global Step: 244490   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:53:25,447-Speed 3045.13 samples/sec   Loss 0.8839   LearningRate 0.0000   Epoch: 19   Global Step: 244500   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:53:28,872-Speed 2991.13 samples/sec   Loss 0.8602   LearningRate 0.0000   Epoch: 19   Global Step: 244510   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:53:32,308-Speed 2980.35 samples/sec   Loss 0.8534   LearningRate 0.0000   Epoch: 19   Global Step: 244520   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:53:35,762-Speed 2965.80 samples/sec   Loss 0.8688   LearningRate 0.0000   Epoch: 19   Global Step: 244530   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:53:39,182-Speed 2994.52 samples/sec   Loss 0.8662   LearningRate 0.0000   Epoch: 19   Global Step: 244540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:53:42,708-Speed 2904.99 samples/sec   Loss 0.8944   LearningRate 0.0000   Epoch: 19   Global Step: 244550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:53:46,082-Speed 3035.80 samples/sec   Loss 0.8757   LearningRate 0.0000   Epoch: 19   Global Step: 244560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:53:49,466-Speed 3026.74 samples/sec   Loss 0.8997   LearningRate 0.0000   Epoch: 19   Global Step: 244570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:53:52,914-Speed 2971.14 samples/sec   Loss 0.9236   LearningRate 0.0000   Epoch: 19   Global Step: 244580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:53:56,335-Speed 2994.06 samples/sec   Loss 0.8484   LearningRate 0.0000   Epoch: 19   Global Step: 244590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:53:59,825-Speed 2934.44 samples/sec   Loss 0.8306   LearningRate 0.0000   Epoch: 19   Global Step: 244600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:54:03,244-Speed 2996.37 samples/sec   Loss 0.8632   LearningRate 0.0000   Epoch: 19   Global Step: 244610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:54:06,642-Speed 3013.92 samples/sec   Loss 0.8774   LearningRate 0.0000   Epoch: 19   Global Step: 244620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:54:10,043-Speed 3012.41 samples/sec   Loss 0.9036   LearningRate 0.0000   Epoch: 19   Global Step: 244630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:54:13,442-Speed 3013.58 samples/sec   Loss 0.8796   LearningRate 0.0000   Epoch: 19   Global Step: 244640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 00:54:16,781-Speed 3066.93 samples/sec   Loss 0.8525   LearningRate 0.0000   Epoch: 19   Global Step: 244650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:54:20,119-Speed 3069.17 samples/sec   Loss 0.8918   LearningRate 0.0000   Epoch: 19   Global Step: 244660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:54:23,485-Speed 3043.30 samples/sec   Loss 0.8996   LearningRate 0.0000   Epoch: 19   Global Step: 244670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:54:26,859-Speed 3035.32 samples/sec   Loss 0.8731   LearningRate 0.0000   Epoch: 19   Global Step: 244680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:54:30,382-Speed 2907.29 samples/sec   Loss 0.8293   LearningRate 0.0000   Epoch: 19   Global Step: 244690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:54:33,806-Speed 2991.11 samples/sec   Loss 0.8873   LearningRate 0.0000   Epoch: 19   Global Step: 244700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:54:37,210-Speed 3009.35 samples/sec   Loss 0.8564   LearningRate 0.0000   Epoch: 19   Global Step: 244710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:54:40,700-Speed 2934.63 samples/sec   Loss 0.8816   LearningRate 0.0000   Epoch: 19   Global Step: 244720   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:54:44,147-Speed 2971.98 samples/sec   Loss 0.8982   LearningRate 0.0000   Epoch: 19   Global Step: 244730   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:54:47,541-Speed 3017.53 samples/sec   Loss 0.8615   LearningRate 0.0000   Epoch: 19   Global Step: 244740   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:54:50,938-Speed 3015.39 samples/sec   Loss 0.8670   LearningRate 0.0000   Epoch: 19   Global Step: 244750   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:54:54,312-Speed 3035.72 samples/sec   Loss 0.8330   LearningRate 0.0000   Epoch: 19   Global Step: 244760   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:54:57,707-Speed 3016.50 samples/sec   Loss 0.8381   LearningRate 0.0000   Epoch: 19   Global Step: 244770   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:55:01,081-Speed 3036.40 samples/sec   Loss 0.8665   LearningRate 0.0000   Epoch: 19   Global Step: 244780   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:55:04,433-Speed 3055.21 samples/sec   Loss 0.8709   LearningRate 0.0000   Epoch: 19   Global Step: 244790   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:55:07,786-Speed 3055.26 samples/sec   Loss 0.8543   LearningRate 0.0000   Epoch: 19   Global Step: 244800   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:55:11,160-Speed 3035.38 samples/sec   Loss 0.8298   LearningRate 0.0000   Epoch: 19   Global Step: 244810   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:55:14,538-Speed 3032.98 samples/sec   Loss 0.8681   LearningRate 0.0000   Epoch: 19   Global Step: 244820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:55:17,927-Speed 3022.24 samples/sec   Loss 0.8810   LearningRate 0.0000   Epoch: 19   Global Step: 244830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:55:21,315-Speed 3023.39 samples/sec   Loss 0.8585   LearningRate 0.0000   Epoch: 19   Global Step: 244840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:55:24,710-Speed 3017.21 samples/sec   Loss 0.8960   LearningRate 0.0000   Epoch: 19   Global Step: 244850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:55:28,108-Speed 3013.89 samples/sec   Loss 0.8556   LearningRate 0.0000   Epoch: 19   Global Step: 244860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:55:31,451-Speed 3064.06 samples/sec   Loss 0.9173   LearningRate 0.0000   Epoch: 19   Global Step: 244870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:55:34,802-Speed 3056.58 samples/sec   Loss 0.8926   LearningRate 0.0000   Epoch: 19   Global Step: 244880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:55:38,194-Speed 3019.93 samples/sec   Loss 0.8699   LearningRate 0.0000   Epoch: 19   Global Step: 244890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:55:41,543-Speed 3059.29 samples/sec   Loss 0.8257   LearningRate 0.0000   Epoch: 19   Global Step: 244900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:55:44,906-Speed 3045.72 samples/sec   Loss 0.8558   LearningRate 0.0000   Epoch: 19   Global Step: 244910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:55:48,308-Speed 3010.03 samples/sec   Loss 0.8713   LearningRate 0.0000   Epoch: 19   Global Step: 244920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:55:51,730-Speed 2994.27 samples/sec   Loss 0.8716   LearningRate 0.0000   Epoch: 19   Global Step: 244930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:55:55,112-Speed 3028.16 samples/sec   Loss 0.8788   LearningRate 0.0000   Epoch: 19   Global Step: 244940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:55:58,486-Speed 3035.53 samples/sec   Loss 0.8598   LearningRate 0.0000   Epoch: 19   Global Step: 244950   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:56:01,838-Speed 3056.27 samples/sec   Loss 0.8681   LearningRate 0.0000   Epoch: 19   Global Step: 244960   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:56:05,194-Speed 3051.64 samples/sec   Loss 0.8655   LearningRate 0.0000   Epoch: 19   Global Step: 244970   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:56:08,530-Speed 3070.29 samples/sec   Loss 0.8841   LearningRate 0.0000   Epoch: 19   Global Step: 244980   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:56:11,898-Speed 3041.75 samples/sec   Loss 0.8827   LearningRate 0.0000   Epoch: 19   Global Step: 244990   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:56:15,287-Speed 3021.83 samples/sec   Loss 0.8477   LearningRate 0.0000   Epoch: 19   Global Step: 245000   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:56:18,689-Speed 3010.84 samples/sec   Loss 0.8543   LearningRate 0.0000   Epoch: 19   Global Step: 245010   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:56:22,050-Speed 3047.88 samples/sec   Loss 0.8477   LearningRate 0.0000   Epoch: 19   Global Step: 245020   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:56:25,455-Speed 3007.61 samples/sec   Loss 0.8903   LearningRate 0.0000   Epoch: 19   Global Step: 245030   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:56:28,822-Speed 3042.15 samples/sec   Loss 0.8681   LearningRate 0.0000   Epoch: 19   Global Step: 245040   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:56:32,181-Speed 3049.25 samples/sec   Loss 0.8176   LearningRate 0.0000   Epoch: 19   Global Step: 245050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:56:35,557-Speed 3034.17 samples/sec   Loss 0.8448   LearningRate 0.0000   Epoch: 19   Global Step: 245060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:56:38,929-Speed 3037.69 samples/sec   Loss 0.8786   LearningRate 0.0000   Epoch: 19   Global Step: 245070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:56:42,277-Speed 3059.26 samples/sec   Loss 0.8774   LearningRate 0.0000   Epoch: 19   Global Step: 245080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:56:45,689-Speed 3002.12 samples/sec   Loss 0.9153   LearningRate 0.0000   Epoch: 19   Global Step: 245090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:56:49,030-Speed 3065.46 samples/sec   Loss 0.8913   LearningRate 0.0000   Epoch: 19   Global Step: 245100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:56:52,392-Speed 3046.52 samples/sec   Loss 0.8560   LearningRate 0.0000   Epoch: 19   Global Step: 245110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:56:55,802-Speed 3003.66 samples/sec   Loss 0.8549   LearningRate 0.0000   Epoch: 19   Global Step: 245120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:56:59,145-Speed 3064.46 samples/sec   Loss 0.8735   LearningRate 0.0000   Epoch: 19   Global Step: 245130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:57:02,545-Speed 3012.28 samples/sec   Loss 0.8570   LearningRate 0.0000   Epoch: 19   Global Step: 245140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:57:05,872-Speed 3078.55 samples/sec   Loss 0.8835   LearningRate 0.0000   Epoch: 19   Global Step: 245150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:57:09,344-Speed 2950.79 samples/sec   Loss 0.8518   LearningRate 0.0000   Epoch: 19   Global Step: 245160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:57:12,799-Speed 2964.06 samples/sec   Loss 0.8482   LearningRate 0.0000   Epoch: 19   Global Step: 245170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:57:16,184-Speed 3026.19 samples/sec   Loss 0.8753   LearningRate 0.0000   Epoch: 19   Global Step: 245180   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:57:19,546-Speed 3047.02 samples/sec   Loss 0.9030   LearningRate 0.0000   Epoch: 19   Global Step: 245190   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:57:22,973-Speed 2988.61 samples/sec   Loss 0.8874   LearningRate 0.0000   Epoch: 19   Global Step: 245200   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:57:26,345-Speed 3037.00 samples/sec   Loss 0.8715   LearningRate 0.0000   Epoch: 19   Global Step: 245210   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:57:29,820-Speed 2947.81 samples/sec   Loss 0.8507   LearningRate 0.0000   Epoch: 19   Global Step: 245220   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:57:33,209-Speed 3022.14 samples/sec   Loss 0.8686   LearningRate 0.0000   Epoch: 19   Global Step: 245230   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:57:36,584-Speed 3035.29 samples/sec   Loss 0.8529   LearningRate 0.0000   Epoch: 19   Global Step: 245240   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:57:39,954-Speed 3039.23 samples/sec   Loss 0.8366   LearningRate 0.0000   Epoch: 19   Global Step: 245250   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:57:43,354-Speed 3012.95 samples/sec   Loss 0.8553   LearningRate 0.0000   Epoch: 19   Global Step: 245260   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:57:46,659-Speed 3098.81 samples/sec   Loss 0.8400   LearningRate 0.0000   Epoch: 19   Global Step: 245270   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:57:50,069-Speed 3003.97 samples/sec   Loss 0.8796   LearningRate 0.0000   Epoch: 19   Global Step: 245280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:57:53,403-Speed 3072.32 samples/sec   Loss 0.8763   LearningRate 0.0000   Epoch: 19   Global Step: 245290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:57:56,789-Speed 3024.72 samples/sec   Loss 0.8624   LearningRate 0.0000   Epoch: 19   Global Step: 245300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:58:00,193-Speed 3008.22 samples/sec   Loss 0.8560   LearningRate 0.0000   Epoch: 19   Global Step: 245310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:58:03,633-Speed 2977.84 samples/sec   Loss 0.8815   LearningRate 0.0000   Epoch: 19   Global Step: 245320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:58:07,088-Speed 2964.82 samples/sec   Loss 0.8089   LearningRate 0.0000   Epoch: 19   Global Step: 245330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:58:10,447-Speed 3049.44 samples/sec   Loss 0.8214   LearningRate 0.0000   Epoch: 19   Global Step: 245340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:58:13,910-Speed 2958.22 samples/sec   Loss 0.8590   LearningRate 0.0000   Epoch: 19   Global Step: 245350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:58:17,293-Speed 3027.89 samples/sec   Loss 0.8583   LearningRate 0.0000   Epoch: 19   Global Step: 245360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:58:20,744-Speed 2967.53 samples/sec   Loss 0.8810   LearningRate 0.0000   Epoch: 19   Global Step: 245370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:58:24,175-Speed 2985.70 samples/sec   Loss 0.8499   LearningRate 0.0000   Epoch: 19   Global Step: 245380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 00:58:27,529-Speed 3053.36 samples/sec   Loss 0.8771   LearningRate 0.0000   Epoch: 19   Global Step: 245390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 00:58:30,923-Speed 3017.68 samples/sec   Loss 0.8951   LearningRate 0.0000   Epoch: 19   Global Step: 245400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 00:58:34,340-Speed 2997.77 samples/sec   Loss 0.8650   LearningRate 0.0000   Epoch: 19   Global Step: 245410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 00:58:37,695-Speed 3053.03 samples/sec   Loss 0.8908   LearningRate 0.0000   Epoch: 19   Global Step: 245420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:58:41,079-Speed 3026.95 samples/sec   Loss 0.8376   LearningRate 0.0000   Epoch: 19   Global Step: 245430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:58:44,483-Speed 3009.46 samples/sec   Loss 0.8227   LearningRate 0.0000   Epoch: 19   Global Step: 245440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:58:47,818-Speed 3070.32 samples/sec   Loss 0.8672   LearningRate 0.0000   Epoch: 19   Global Step: 245450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:58:51,160-Speed 3065.65 samples/sec   Loss 0.8526   LearningRate 0.0000   Epoch: 19   Global Step: 245460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:58:54,545-Speed 3025.64 samples/sec   Loss 0.8511   LearningRate 0.0000   Epoch: 19   Global Step: 245470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:58:57,933-Speed 3022.76 samples/sec   Loss 0.8746   LearningRate 0.0000   Epoch: 19   Global Step: 245480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:59:01,340-Speed 3006.66 samples/sec   Loss 0.8236   LearningRate 0.0000   Epoch: 19   Global Step: 245490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:59:04,793-Speed 2966.16 samples/sec   Loss 0.8510   LearningRate 0.0000   Epoch: 19   Global Step: 245500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:59:08,161-Speed 3041.56 samples/sec   Loss 0.8833   LearningRate 0.0000   Epoch: 19   Global Step: 245510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:59:11,531-Speed 3039.47 samples/sec   Loss 0.8081   LearningRate 0.0000   Epoch: 19   Global Step: 245520   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:59:14,902-Speed 3038.74 samples/sec   Loss 0.8444   LearningRate 0.0000   Epoch: 19   Global Step: 245530   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:59:18,245-Speed 3063.60 samples/sec   Loss 0.8740   LearningRate 0.0000   Epoch: 19   Global Step: 245540   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:59:21,602-Speed 3051.42 samples/sec   Loss 0.8920   LearningRate 0.0000   Epoch: 19   Global Step: 245550   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:59:25,008-Speed 3007.79 samples/sec   Loss 0.8763   LearningRate 0.0000   Epoch: 19   Global Step: 245560   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:59:28,380-Speed 3037.52 samples/sec   Loss 0.8714   LearningRate 0.0000   Epoch: 19   Global Step: 245570   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:59:31,807-Speed 2988.58 samples/sec   Loss 0.8846   LearningRate 0.0000   Epoch: 19   Global Step: 245580   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:59:35,187-Speed 3030.49 samples/sec   Loss 0.9176   LearningRate 0.0000   Epoch: 19   Global Step: 245590   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:59:38,569-Speed 3028.30 samples/sec   Loss 0.8390   LearningRate 0.0000   Epoch: 19   Global Step: 245600   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:59:41,929-Speed 3048.91 samples/sec   Loss 0.8795   LearningRate 0.0000   Epoch: 19   Global Step: 245610   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 00:59:45,394-Speed 2956.29 samples/sec   Loss 0.8998   LearningRate 0.0000   Epoch: 19   Global Step: 245620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:59:48,795-Speed 3011.64 samples/sec   Loss 0.8524   LearningRate 0.0000   Epoch: 19   Global Step: 245630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:59:52,227-Speed 2984.91 samples/sec   Loss 0.8801   LearningRate 0.0000   Epoch: 19   Global Step: 245640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:59:55,560-Speed 3072.66 samples/sec   Loss 0.9158   LearningRate 0.0000   Epoch: 19   Global Step: 245650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 00:59:58,926-Speed 3042.89 samples/sec   Loss 0.8740   LearningRate 0.0000   Epoch: 19   Global Step: 245660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:02,330-Speed 3009.67 samples/sec   Loss 0.8524   LearningRate 0.0000   Epoch: 19   Global Step: 245670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:05,727-Speed 3015.36 samples/sec   Loss 0.8735   LearningRate 0.0000   Epoch: 19   Global Step: 245680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:09,083-Speed 3051.57 samples/sec   Loss 0.8696   LearningRate 0.0000   Epoch: 19   Global Step: 245690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:12,415-Speed 3074.68 samples/sec   Loss 0.8613   LearningRate 0.0000   Epoch: 19   Global Step: 245700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:15,786-Speed 3038.41 samples/sec   Loss 0.8859   LearningRate 0.0000   Epoch: 19   Global Step: 245710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:19,215-Speed 2986.60 samples/sec   Loss 0.8313   LearningRate 0.0000   Epoch: 19   Global Step: 245720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:00:22,572-Speed 3051.24 samples/sec   Loss 0.8763   LearningRate 0.0000   Epoch: 19   Global Step: 245730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:25,940-Speed 3041.48 samples/sec   Loss 0.8374   LearningRate 0.0000   Epoch: 19   Global Step: 245740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:29,337-Speed 3015.66 samples/sec   Loss 0.8574   LearningRate 0.0000   Epoch: 19   Global Step: 245750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:32,711-Speed 3036.43 samples/sec   Loss 0.8659   LearningRate 0.0000   Epoch: 19   Global Step: 245760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:36,056-Speed 3061.96 samples/sec   Loss 0.9077   LearningRate 0.0000   Epoch: 19   Global Step: 245770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:39,436-Speed 3029.91 samples/sec   Loss 0.8687   LearningRate 0.0000   Epoch: 19   Global Step: 245780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:42,907-Speed 2950.64 samples/sec   Loss 0.8418   LearningRate 0.0000   Epoch: 19   Global Step: 245790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:46,296-Speed 3022.48 samples/sec   Loss 0.8692   LearningRate 0.0000   Epoch: 19   Global Step: 245800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:49,750-Speed 2965.61 samples/sec   Loss 0.8820   LearningRate 0.0000   Epoch: 19   Global Step: 245810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:53,103-Speed 3054.69 samples/sec   Loss 0.8831   LearningRate 0.0000   Epoch: 19   Global Step: 245820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:00:56,443-Speed 3066.32 samples/sec   Loss 0.8660   LearningRate 0.0000   Epoch: 19   Global Step: 245830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:00:59,837-Speed 3019.01 samples/sec   Loss 0.8958   LearningRate 0.0000   Epoch: 19   Global Step: 245840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:03,204-Speed 3042.18 samples/sec   Loss 0.8605   LearningRate 0.0000   Epoch: 19   Global Step: 245850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:06,588-Speed 3026.82 samples/sec   Loss 0.8722   LearningRate 0.0000   Epoch: 19   Global Step: 245860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:09,997-Speed 3004.84 samples/sec   Loss 0.8690   LearningRate 0.0000   Epoch: 19   Global Step: 245870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:13,362-Speed 3044.00 samples/sec   Loss 0.8713   LearningRate 0.0000   Epoch: 19   Global Step: 245880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:16,734-Speed 3037.26 samples/sec   Loss 0.8778   LearningRate 0.0000   Epoch: 19   Global Step: 245890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:20,088-Speed 3054.29 samples/sec   Loss 0.8624   LearningRate 0.0000   Epoch: 19   Global Step: 245900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:23,547-Speed 2961.30 samples/sec   Loss 0.8362   LearningRate 0.0000   Epoch: 19   Global Step: 245910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:26,921-Speed 3035.46 samples/sec   Loss 0.8762   LearningRate 0.0000   Epoch: 19   Global Step: 245920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:30,426-Speed 2922.54 samples/sec   Loss 0.8669   LearningRate 0.0000   Epoch: 19   Global Step: 245930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:33,842-Speed 2998.38 samples/sec   Loss 0.8715   LearningRate 0.0000   Epoch: 19   Global Step: 245940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:01:37,235-Speed 3019.12 samples/sec   Loss 0.8505   LearningRate 0.0000   Epoch: 19   Global Step: 245950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:40,596-Speed 3046.89 samples/sec   Loss 0.8678   LearningRate 0.0000   Epoch: 19   Global Step: 245960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:43,940-Speed 3063.68 samples/sec   Loss 0.8502   LearningRate 0.0000   Epoch: 19   Global Step: 245970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:47,299-Speed 3049.21 samples/sec   Loss 0.8543   LearningRate 0.0000   Epoch: 19   Global Step: 245980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:50,691-Speed 3019.10 samples/sec   Loss 0.8812   LearningRate 0.0000   Epoch: 19   Global Step: 245990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:01:54,052-Speed 3047.82 samples/sec   Loss 0.8673   LearningRate 0.0000   Epoch: 19   Global Step: 246000   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:01:57,435-Speed 3027.30 samples/sec   Loss 0.8718   LearningRate 0.0000   Epoch: 19   Global Step: 246010   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:00,758-Speed 3082.54 samples/sec   Loss 0.8943   LearningRate 0.0000   Epoch: 19   Global Step: 246020   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:04,141-Speed 3028.22 samples/sec   Loss 0.8547   LearningRate 0.0000   Epoch: 19   Global Step: 246030   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:07,483-Speed 3065.53 samples/sec   Loss 0.8359   LearningRate 0.0000   Epoch: 19   Global Step: 246040   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:10,852-Speed 3039.78 samples/sec   Loss 0.8827   LearningRate 0.0000   Epoch: 19   Global Step: 246050   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:14,188-Speed 3070.23 samples/sec   Loss 0.8768   LearningRate 0.0000   Epoch: 19   Global Step: 246060   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:17,581-Speed 3019.09 samples/sec   Loss 0.8906   LearningRate 0.0000   Epoch: 19   Global Step: 246070   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:20,915-Speed 3071.79 samples/sec   Loss 0.9052   LearningRate 0.0000   Epoch: 19   Global Step: 246080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:24,294-Speed 3031.96 samples/sec   Loss 0.8744   LearningRate 0.0000   Epoch: 19   Global Step: 246090   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:27,695-Speed 3011.67 samples/sec   Loss 0.8439   LearningRate 0.0000   Epoch: 19   Global Step: 246100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:02:31,067-Speed 3037.10 samples/sec   Loss 0.8763   LearningRate 0.0000   Epoch: 19   Global Step: 246110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:34,544-Speed 2946.29 samples/sec   Loss 0.8885   LearningRate 0.0000   Epoch: 19   Global Step: 246120   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:37,974-Speed 2986.01 samples/sec   Loss 0.8640   LearningRate 0.0000   Epoch: 19   Global Step: 246130   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:41,366-Speed 3019.84 samples/sec   Loss 0.8549   LearningRate 0.0000   Epoch: 19   Global Step: 246140   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:44,775-Speed 3004.80 samples/sec   Loss 0.8572   LearningRate 0.0000   Epoch: 19   Global Step: 246150   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:48,180-Speed 3008.02 samples/sec   Loss 0.8161   LearningRate 0.0000   Epoch: 19   Global Step: 246160   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:51,544-Speed 3045.11 samples/sec   Loss 0.8876   LearningRate 0.0000   Epoch: 19   Global Step: 246170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:54,892-Speed 3058.93 samples/sec   Loss 0.8395   LearningRate 0.0000   Epoch: 19   Global Step: 246180   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:02:58,295-Speed 3010.05 samples/sec   Loss 0.8467   LearningRate 0.0000   Epoch: 19   Global Step: 246190   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:03:01,646-Speed 3057.03 samples/sec   Loss 0.9019   LearningRate 0.0000   Epoch: 19   Global Step: 246200   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:03:05,088-Speed 2975.46 samples/sec   Loss 0.8221   LearningRate 0.0000   Epoch: 19   Global Step: 246210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:08,498-Speed 3003.02 samples/sec   Loss 0.8630   LearningRate 0.0000   Epoch: 19   Global Step: 246220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:11,865-Speed 3042.09 samples/sec   Loss 0.8734   LearningRate 0.0000   Epoch: 19   Global Step: 246230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:15,202-Speed 3070.05 samples/sec   Loss 0.8986   LearningRate 0.0000   Epoch: 19   Global Step: 246240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:18,549-Speed 3060.20 samples/sec   Loss 0.8498   LearningRate 0.0000   Epoch: 19   Global Step: 246250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:21,955-Speed 3006.66 samples/sec   Loss 0.8741   LearningRate 0.0000   Epoch: 19   Global Step: 246260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:25,283-Speed 3078.74 samples/sec   Loss 0.8840   LearningRate 0.0000   Epoch: 19   Global Step: 246270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:28,601-Speed 3086.59 samples/sec   Loss 0.8575   LearningRate 0.0000   Epoch: 19   Global Step: 246280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:32,004-Speed 3009.72 samples/sec   Loss 0.8153   LearningRate 0.0000   Epoch: 19   Global Step: 246290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:35,402-Speed 3014.37 samples/sec   Loss 0.8691   LearningRate 0.0000   Epoch: 19   Global Step: 246300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:38,782-Speed 3030.32 samples/sec   Loss 0.8682   LearningRate 0.0000   Epoch: 19   Global Step: 246310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:03:42,124-Speed 3064.77 samples/sec   Loss 0.8840   LearningRate 0.0000   Epoch: 19   Global Step: 246320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:45,476-Speed 3057.92 samples/sec   Loss 0.8322   LearningRate 0.0000   Epoch: 19   Global Step: 246330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:48,814-Speed 3068.99 samples/sec   Loss 0.8253   LearningRate 0.0000   Epoch: 19   Global Step: 246340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:52,205-Speed 3020.57 samples/sec   Loss 0.8728   LearningRate 0.0000   Epoch: 19   Global Step: 246350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:55,623-Speed 2996.39 samples/sec   Loss 0.8995   LearningRate 0.0000   Epoch: 19   Global Step: 246360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:03:59,003-Speed 3030.07 samples/sec   Loss 0.9132   LearningRate 0.0000   Epoch: 19   Global Step: 246370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:04:02,385-Speed 3028.79 samples/sec   Loss 0.8113   LearningRate 0.0000   Epoch: 19   Global Step: 246380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:04:05,785-Speed 3012.57 samples/sec   Loss 0.8796   LearningRate 0.0000   Epoch: 19   Global Step: 246390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:04:09,174-Speed 3022.07 samples/sec   Loss 0.8510   LearningRate 0.0000   Epoch: 19   Global Step: 246400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:04:12,505-Speed 3075.74 samples/sec   Loss 0.8714   LearningRate 0.0000   Epoch: 19   Global Step: 246410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:04:15,888-Speed 3027.68 samples/sec   Loss 0.8811   LearningRate 0.0000   Epoch: 19   Global Step: 246420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:04:19,302-Speed 2999.98 samples/sec   Loss 0.8980   LearningRate 0.0000   Epoch: 19   Global Step: 246430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:04:22,637-Speed 3071.47 samples/sec   Loss 0.8583   LearningRate 0.0000   Epoch: 19   Global Step: 246440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:04:25,958-Speed 3084.14 samples/sec   Loss 0.8801   LearningRate 0.0000   Epoch: 19   Global Step: 246450   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:04:29,395-Speed 2980.48 samples/sec   Loss 0.8724   LearningRate 0.0000   Epoch: 19   Global Step: 246460   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:04:32,784-Speed 3022.36 samples/sec   Loss 0.8925   LearningRate 0.0000   Epoch: 19   Global Step: 246470   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:04:36,139-Speed 3053.60 samples/sec   Loss 0.8409   LearningRate 0.0000   Epoch: 19   Global Step: 246480   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:04:39,538-Speed 3012.95 samples/sec   Loss 0.8692   LearningRate 0.0000   Epoch: 19   Global Step: 246490   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:04:42,963-Speed 2990.16 samples/sec   Loss 0.8758   LearningRate 0.0000   Epoch: 19   Global Step: 246500   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:04:46,302-Speed 3068.50 samples/sec   Loss 0.8719   LearningRate 0.0000   Epoch: 19   Global Step: 246510   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:04:49,778-Speed 2946.55 samples/sec   Loss 0.8830   LearningRate 0.0000   Epoch: 19   Global Step: 246520   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:04:53,168-Speed 3021.93 samples/sec   Loss 0.8390   LearningRate 0.0000   Epoch: 19   Global Step: 246530   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:04:56,506-Speed 3068.44 samples/sec   Loss 0.8658   LearningRate 0.0000   Epoch: 19   Global Step: 246540   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:04:59,954-Speed 2971.31 samples/sec   Loss 0.9215   LearningRate 0.0000   Epoch: 19   Global Step: 246550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:05:03,299-Speed 3062.35 samples/sec   Loss 0.8761   LearningRate 0.0000   Epoch: 19   Global Step: 246560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:05:06,702-Speed 3009.55 samples/sec   Loss 0.8263   LearningRate 0.0000   Epoch: 19   Global Step: 246570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:05:10,183-Speed 2942.19 samples/sec   Loss 0.9012   LearningRate 0.0000   Epoch: 19   Global Step: 246580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:05:13,514-Speed 3075.21 samples/sec   Loss 0.8507   LearningRate 0.0000   Epoch: 19   Global Step: 246590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:05:16,884-Speed 3039.10 samples/sec   Loss 0.8513   LearningRate 0.0000   Epoch: 19   Global Step: 246600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:05:20,235-Speed 3057.21 samples/sec   Loss 0.8710   LearningRate 0.0000   Epoch: 19   Global Step: 246610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:05:23,622-Speed 3023.60 samples/sec   Loss 0.8897   LearningRate 0.0000   Epoch: 19   Global Step: 246620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:05:26,992-Speed 3039.33 samples/sec   Loss 0.8774   LearningRate 0.0000   Epoch: 19   Global Step: 246630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:05:30,394-Speed 3011.69 samples/sec   Loss 0.8786   LearningRate 0.0000   Epoch: 19   Global Step: 246640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:05:33,724-Speed 3075.62 samples/sec   Loss 0.8839   LearningRate 0.0000   Epoch: 19   Global Step: 246650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:05:37,116-Speed 3019.47 samples/sec   Loss 0.8828   LearningRate 0.0000   Epoch: 19   Global Step: 246660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:05:40,488-Speed 3037.18 samples/sec   Loss 0.8638   LearningRate 0.0000   Epoch: 19   Global Step: 246670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:05:43,842-Speed 3054.68 samples/sec   Loss 0.8496   LearningRate 0.0000   Epoch: 19   Global Step: 246680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:05:47,175-Speed 3073.06 samples/sec   Loss 0.8658   LearningRate 0.0000   Epoch: 19   Global Step: 246690   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:05:50,584-Speed 3004.22 samples/sec   Loss 0.9206   LearningRate 0.0000   Epoch: 19   Global Step: 246700   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:05:53,967-Speed 3028.03 samples/sec   Loss 0.8696   LearningRate 0.0000   Epoch: 19   Global Step: 246710   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:05:57,332-Speed 3043.66 samples/sec   Loss 0.8424   LearningRate 0.0000   Epoch: 19   Global Step: 246720   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:00,777-Speed 2973.10 samples/sec   Loss 0.8808   LearningRate 0.0000   Epoch: 19   Global Step: 246730   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:04,192-Speed 2999.33 samples/sec   Loss 0.9038   LearningRate 0.0000   Epoch: 19   Global Step: 246740   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:07,603-Speed 3002.25 samples/sec   Loss 0.8942   LearningRate 0.0000   Epoch: 19   Global Step: 246750   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:10,955-Speed 3056.41 samples/sec   Loss 0.8585   LearningRate 0.0000   Epoch: 19   Global Step: 246760   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:14,381-Speed 2989.80 samples/sec   Loss 0.8706   LearningRate 0.0000   Epoch: 19   Global Step: 246770   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:17,719-Speed 3068.06 samples/sec   Loss 0.8687   LearningRate 0.0000   Epoch: 19   Global Step: 246780   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:21,083-Speed 3045.23 samples/sec   Loss 0.8651   LearningRate 0.0000   Epoch: 19   Global Step: 246790   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:24,410-Speed 3078.86 samples/sec   Loss 0.8688   LearningRate 0.0000   Epoch: 19   Global Step: 246800   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:27,760-Speed 3057.35 samples/sec   Loss 0.8938   LearningRate 0.0000   Epoch: 19   Global Step: 246810   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:31,153-Speed 3018.11 samples/sec   Loss 0.8505   LearningRate 0.0000   Epoch: 19   Global Step: 246820   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:34,594-Speed 2977.30 samples/sec   Loss 0.9154   LearningRate 0.0000   Epoch: 19   Global Step: 246830   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:38,022-Speed 2987.77 samples/sec   Loss 0.8427   LearningRate 0.0000   Epoch: 19   Global Step: 246840   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:41,419-Speed 3014.83 samples/sec   Loss 0.8456   LearningRate 0.0000   Epoch: 19   Global Step: 246850   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:44,849-Speed 2986.11 samples/sec   Loss 0.8641   LearningRate 0.0000   Epoch: 19   Global Step: 246860   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:48,221-Speed 3037.76 samples/sec   Loss 0.8609   LearningRate 0.0000   Epoch: 19   Global Step: 246870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:51,598-Speed 3033.11 samples/sec   Loss 0.8483   LearningRate 0.0000   Epoch: 19   Global Step: 246880   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:06:55,032-Speed 2983.05 samples/sec   Loss 0.8682   LearningRate 0.0000   Epoch: 19   Global Step: 246890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:06:58,442-Speed 3003.78 samples/sec   Loss 0.8590   LearningRate 0.0000   Epoch: 19   Global Step: 246900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:01,827-Speed 3026.50 samples/sec   Loss 0.8558   LearningRate 0.0000   Epoch: 19   Global Step: 246910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:05,191-Speed 3044.97 samples/sec   Loss 0.8398   LearningRate 0.0000   Epoch: 19   Global Step: 246920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:08,584-Speed 3018.39 samples/sec   Loss 0.8597   LearningRate 0.0000   Epoch: 19   Global Step: 246930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:11,991-Speed 3006.35 samples/sec   Loss 0.8627   LearningRate 0.0000   Epoch: 19   Global Step: 246940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:15,321-Speed 3076.49 samples/sec   Loss 0.8664   LearningRate 0.0000   Epoch: 19   Global Step: 246950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:18,707-Speed 3024.48 samples/sec   Loss 0.8459   LearningRate 0.0000   Epoch: 19   Global Step: 246960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:22,088-Speed 3029.50 samples/sec   Loss 0.8899   LearningRate 0.0000   Epoch: 19   Global Step: 246970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:25,457-Speed 3040.25 samples/sec   Loss 0.8688   LearningRate 0.0000   Epoch: 19   Global Step: 246980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:28,903-Speed 2972.35 samples/sec   Loss 0.8604   LearningRate 0.0000   Epoch: 19   Global Step: 246990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:07:32,250-Speed 3063.15 samples/sec   Loss 0.8567   LearningRate 0.0000   Epoch: 19   Global Step: 247000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:07:35,675-Speed 2990.15 samples/sec   Loss 0.8444   LearningRate 0.0000   Epoch: 19   Global Step: 247010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:07:39,004-Speed 3076.60 samples/sec   Loss 0.8803   LearningRate 0.0000   Epoch: 19   Global Step: 247020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:42,361-Speed 3051.33 samples/sec   Loss 0.8366   LearningRate 0.0000   Epoch: 19   Global Step: 247030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:45,803-Speed 2975.99 samples/sec   Loss 0.8814   LearningRate 0.0000   Epoch: 19   Global Step: 247040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:49,332-Speed 2902.47 samples/sec   Loss 0.8836   LearningRate 0.0000   Epoch: 19   Global Step: 247050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:52,765-Speed 2983.39 samples/sec   Loss 0.9037   LearningRate 0.0000   Epoch: 19   Global Step: 247060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:56,241-Speed 2947.10 samples/sec   Loss 0.8864   LearningRate 0.0000   Epoch: 19   Global Step: 247070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:07:59,609-Speed 3041.80 samples/sec   Loss 0.8722   LearningRate 0.0000   Epoch: 19   Global Step: 247080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:08:03,010-Speed 3011.75 samples/sec   Loss 0.8552   LearningRate 0.0000   Epoch: 19   Global Step: 247090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:08:06,418-Speed 3005.32 samples/sec   Loss 0.8494   LearningRate 0.0000   Epoch: 19   Global Step: 247100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:08:09,806-Speed 3023.27 samples/sec   Loss 0.8416   LearningRate 0.0000   Epoch: 19   Global Step: 247110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:08:13,169-Speed 3045.13 samples/sec   Loss 0.8578   LearningRate 0.0000   Epoch: 19   Global Step: 247120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:08:16,531-Speed 3047.13 samples/sec   Loss 0.8813   LearningRate 0.0000   Epoch: 19   Global Step: 247130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:08:19,957-Speed 2989.27 samples/sec   Loss 0.9177   LearningRate 0.0000   Epoch: 19   Global Step: 247140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:08:23,353-Speed 3016.99 samples/sec   Loss 0.8148   LearningRate 0.0000   Epoch: 19   Global Step: 247150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:08:26,746-Speed 3018.15 samples/sec   Loss 0.8421   LearningRate 0.0000   Epoch: 19   Global Step: 247160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:08:30,129-Speed 3028.37 samples/sec   Loss 0.8795   LearningRate 0.0000   Epoch: 19   Global Step: 247170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:08:33,589-Speed 2960.12 samples/sec   Loss 0.8816   LearningRate 0.0000   Epoch: 19   Global Step: 247180   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:08:36,986-Speed 3015.13 samples/sec   Loss 0.8766   LearningRate 0.0000   Epoch: 19   Global Step: 247190   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:08:40,340-Speed 3054.44 samples/sec   Loss 0.8369   LearningRate 0.0000   Epoch: 19   Global Step: 247200   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:08:43,684-Speed 3062.29 samples/sec   Loss 0.8463   LearningRate 0.0000   Epoch: 19   Global Step: 247210   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:08:47,120-Speed 2981.02 samples/sec   Loss 0.8608   LearningRate 0.0000   Epoch: 19   Global Step: 247220   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:08:50,530-Speed 3003.82 samples/sec   Loss 0.8496   LearningRate 0.0000   Epoch: 19   Global Step: 247230   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:08:53,915-Speed 3026.12 samples/sec   Loss 0.8489   LearningRate 0.0000   Epoch: 19   Global Step: 247240   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:08:57,258-Speed 3063.65 samples/sec   Loss 0.8857   LearningRate 0.0000   Epoch: 19   Global Step: 247250   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:09:00,661-Speed 3010.14 samples/sec   Loss 0.8519   LearningRate 0.0000   Epoch: 19   Global Step: 247260   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:09:04,019-Speed 3050.45 samples/sec   Loss 0.8728   LearningRate 0.0000   Epoch: 19   Global Step: 247270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:09:07,382-Speed 3046.11 samples/sec   Loss 0.8725   LearningRate 0.0000   Epoch: 19   Global Step: 247280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:09:10,747-Speed 3043.71 samples/sec   Loss 0.8552   LearningRate 0.0000   Epoch: 19   Global Step: 247290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:09:14,114-Speed 3041.64 samples/sec   Loss 0.8906   LearningRate 0.0000   Epoch: 19   Global Step: 247300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:09:17,477-Speed 3045.77 samples/sec   Loss 0.8261   LearningRate 0.0000   Epoch: 19   Global Step: 247310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:09:20,833-Speed 3052.18 samples/sec   Loss 0.8727   LearningRate 0.0000   Epoch: 19   Global Step: 247320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:09:24,169-Speed 3071.25 samples/sec   Loss 0.8670   LearningRate 0.0000   Epoch: 19   Global Step: 247330   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:09:27,587-Speed 2996.05 samples/sec   Loss 0.8436   LearningRate 0.0000   Epoch: 19   Global Step: 247340   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:09:31,015-Speed 2988.21 samples/sec   Loss 0.8710   LearningRate 0.0000   Epoch: 19   Global Step: 247350   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:09:34,342-Speed 3078.86 samples/sec   Loss 0.8516   LearningRate 0.0000   Epoch: 19   Global Step: 247360   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:09:37,713-Speed 3038.62 samples/sec   Loss 0.8882   LearningRate 0.0000   Epoch: 19   Global Step: 247370   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:09:41,225-Speed 2917.16 samples/sec   Loss 0.8683   LearningRate 0.0000   Epoch: 19   Global Step: 247380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:09:44,701-Speed 2945.77 samples/sec   Loss 0.8281   LearningRate 0.0000   Epoch: 19   Global Step: 247390   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:09:48,098-Speed 3015.90 samples/sec   Loss 0.8473   LearningRate 0.0000   Epoch: 19   Global Step: 247400   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:09:51,538-Speed 2978.03 samples/sec   Loss 0.8788   LearningRate 0.0000   Epoch: 19   Global Step: 247410   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:09:55,020-Speed 2941.44 samples/sec   Loss 0.9154   LearningRate 0.0000   Epoch: 19   Global Step: 247420   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:09:58,344-Speed 3080.99 samples/sec   Loss 0.8744   LearningRate 0.0000   Epoch: 19   Global Step: 247430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:01,751-Speed 3006.71 samples/sec   Loss 0.8675   LearningRate 0.0000   Epoch: 19   Global Step: 247440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:05,234-Speed 2940.71 samples/sec   Loss 0.8186   LearningRate 0.0000   Epoch: 19   Global Step: 247450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:08,669-Speed 2982.02 samples/sec   Loss 0.8933   LearningRate 0.0000   Epoch: 19   Global Step: 247460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:12,041-Speed 3037.85 samples/sec   Loss 0.8838   LearningRate 0.0000   Epoch: 19   Global Step: 247470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:15,447-Speed 3006.69 samples/sec   Loss 0.8221   LearningRate 0.0000   Epoch: 19   Global Step: 247480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:18,806-Speed 3049.19 samples/sec   Loss 0.8672   LearningRate 0.0000   Epoch: 19   Global Step: 247490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:22,194-Speed 3024.13 samples/sec   Loss 0.9465   LearningRate 0.0000   Epoch: 19   Global Step: 247500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:25,557-Speed 3045.47 samples/sec   Loss 0.8676   LearningRate 0.0000   Epoch: 19   Global Step: 247510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:28,923-Speed 3043.41 samples/sec   Loss 0.8531   LearningRate 0.0000   Epoch: 19   Global Step: 247520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:32,266-Speed 3063.94 samples/sec   Loss 0.8433   LearningRate 0.0000   Epoch: 19   Global Step: 247530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:10:35,550-Speed 3118.69 samples/sec   Loss 0.8528   LearningRate 0.0000   Epoch: 19   Global Step: 247540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:38,899-Speed 3058.64 samples/sec   Loss 0.8397   LearningRate 0.0000   Epoch: 19   Global Step: 247550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:42,279-Speed 3030.10 samples/sec   Loss 0.8778   LearningRate 0.0000   Epoch: 19   Global Step: 247560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:45,700-Speed 2994.29 samples/sec   Loss 0.8305   LearningRate 0.0000   Epoch: 19   Global Step: 247570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:49,110-Speed 3004.03 samples/sec   Loss 0.8314   LearningRate 0.0000   Epoch: 19   Global Step: 247580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:52,463-Speed 3054.54 samples/sec   Loss 0.8608   LearningRate 0.0000   Epoch: 19   Global Step: 247590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:55,867-Speed 3008.99 samples/sec   Loss 0.8571   LearningRate 0.0000   Epoch: 19   Global Step: 247600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:10:59,317-Speed 2968.87 samples/sec   Loss 0.8671   LearningRate 0.0000   Epoch: 19   Global Step: 247610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:11:02,703-Speed 3025.91 samples/sec   Loss 0.8457   LearningRate 0.0000   Epoch: 19   Global Step: 247620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:11:06,079-Speed 3034.03 samples/sec   Loss 0.8598   LearningRate 0.0000   Epoch: 19   Global Step: 247630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:11:09,463-Speed 3026.62 samples/sec   Loss 0.8928   LearningRate 0.0000   Epoch: 19   Global Step: 247640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:11:12,848-Speed 3026.32 samples/sec   Loss 0.8555   LearningRate 0.0000   Epoch: 19   Global Step: 247650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:11:16,219-Speed 3037.51 samples/sec   Loss 0.8620   LearningRate 0.0000   Epoch: 19   Global Step: 247660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:11:19,654-Speed 2982.13 samples/sec   Loss 0.8411   LearningRate 0.0000   Epoch: 19   Global Step: 247670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:11:22,999-Speed 3062.71 samples/sec   Loss 0.8590   LearningRate 0.0000   Epoch: 19   Global Step: 247680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:11:26,430-Speed 2984.76 samples/sec   Loss 0.7916   LearningRate 0.0000   Epoch: 19   Global Step: 247690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:11:29,776-Speed 3062.10 samples/sec   Loss 0.8741   LearningRate 0.0000   Epoch: 19   Global Step: 247700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:11:33,155-Speed 3030.77 samples/sec   Loss 0.8689   LearningRate 0.0000   Epoch: 19   Global Step: 247710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:11:36,525-Speed 3039.62 samples/sec   Loss 0.8880   LearningRate 0.0000   Epoch: 19   Global Step: 247720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:11:39,841-Speed 3089.24 samples/sec   Loss 0.8716   LearningRate 0.0000   Epoch: 19   Global Step: 247730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:11:43,229-Speed 3022.69 samples/sec   Loss 0.8795   LearningRate 0.0000   Epoch: 19   Global Step: 247740   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:11:46,564-Speed 3071.88 samples/sec   Loss 0.8736   LearningRate 0.0000   Epoch: 19   Global Step: 247750   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:11:49,981-Speed 2997.23 samples/sec   Loss 0.8129   LearningRate 0.0000   Epoch: 19   Global Step: 247760   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:11:53,336-Speed 3052.96 samples/sec   Loss 0.8648   LearningRate 0.0000   Epoch: 19   Global Step: 247770   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:11:56,684-Speed 3059.44 samples/sec   Loss 0.8706   LearningRate 0.0000   Epoch: 19   Global Step: 247780   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:12:00,029-Speed 3061.73 samples/sec   Loss 0.8913   LearningRate 0.0000   Epoch: 19   Global Step: 247790   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:12:03,412-Speed 3027.95 samples/sec   Loss 0.8784   LearningRate 0.0000   Epoch: 19   Global Step: 247800   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:12:06,750-Speed 3069.05 samples/sec   Loss 0.8767   LearningRate 0.0000   Epoch: 19   Global Step: 247810   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:12:10,068-Speed 3086.66 samples/sec   Loss 0.8618   LearningRate 0.0000   Epoch: 19   Global Step: 247820   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:12:13,456-Speed 3023.63 samples/sec   Loss 0.8897   LearningRate 0.0000   Epoch: 19   Global Step: 247830   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:12:16,896-Speed 2977.61 samples/sec   Loss 0.8864   LearningRate 0.0000   Epoch: 19   Global Step: 247840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:12:20,238-Speed 3064.98 samples/sec   Loss 0.8721   LearningRate 0.0000   Epoch: 19   Global Step: 247850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:12:23,618-Speed 3030.42 samples/sec   Loss 0.8893   LearningRate 0.0000   Epoch: 19   Global Step: 247860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:12:26,980-Speed 3046.34 samples/sec   Loss 0.8574   LearningRate 0.0000   Epoch: 19   Global Step: 247870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:12:30,359-Speed 3031.44 samples/sec   Loss 0.9115   LearningRate 0.0000   Epoch: 19   Global Step: 247880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:12:33,681-Speed 3082.93 samples/sec   Loss 0.8982   LearningRate 0.0000   Epoch: 19   Global Step: 247890   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:12:37,024-Speed 3064.32 samples/sec   Loss 0.9015   LearningRate 0.0000   Epoch: 19   Global Step: 247900   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:12:40,440-Speed 2998.12 samples/sec   Loss 0.8183   LearningRate 0.0000   Epoch: 19   Global Step: 247910   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:12:43,801-Speed 3048.28 samples/sec   Loss 0.8955   LearningRate 0.0000   Epoch: 19   Global Step: 247920   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:12:47,256-Speed 2964.38 samples/sec   Loss 0.8328   LearningRate 0.0000   Epoch: 19   Global Step: 247930   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:12:50,621-Speed 3043.98 samples/sec   Loss 0.8281   LearningRate 0.0000   Epoch: 19   Global Step: 247940   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:12:54,039-Speed 2997.42 samples/sec   Loss 0.8796   LearningRate 0.0000   Epoch: 19   Global Step: 247950   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:12:57,402-Speed 3044.98 samples/sec   Loss 0.8785   LearningRate 0.0000   Epoch: 19   Global Step: 247960   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:13:00,865-Speed 2958.15 samples/sec   Loss 0.8464   LearningRate 0.0000   Epoch: 19   Global Step: 247970   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:13:04,234-Speed 3040.33 samples/sec   Loss 0.8637   LearningRate 0.0000   Epoch: 19   Global Step: 247980   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:13:07,592-Speed 3050.82 samples/sec   Loss 0.8939   LearningRate 0.0000   Epoch: 19   Global Step: 247990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:13:11,035-Speed 2974.78 samples/sec   Loss 0.8595   LearningRate 0.0000   Epoch: 19   Global Step: 248000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:13:14,371-Speed 3070.45 samples/sec   Loss 0.8731   LearningRate 0.0000   Epoch: 19   Global Step: 248010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:13:17,762-Speed 3020.66 samples/sec   Loss 0.8925   LearningRate 0.0000   Epoch: 19   Global Step: 248020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:13:21,152-Speed 3021.35 samples/sec   Loss 0.8759   LearningRate 0.0000   Epoch: 19   Global Step: 248030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:13:24,581-Speed 2987.00 samples/sec   Loss 0.8543   LearningRate 0.0000   Epoch: 19   Global Step: 248040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:13:28,006-Speed 2990.18 samples/sec   Loss 0.9093   LearningRate 0.0000   Epoch: 19   Global Step: 248050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:13:31,332-Speed 3079.96 samples/sec   Loss 0.8513   LearningRate 0.0000   Epoch: 19   Global Step: 248060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:13:34,722-Speed 3022.01 samples/sec   Loss 0.8442   LearningRate 0.0000   Epoch: 19   Global Step: 248070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:13:38,077-Speed 3052.13 samples/sec   Loss 0.8545   LearningRate 0.0000   Epoch: 19   Global Step: 248080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:13:41,397-Speed 3086.11 samples/sec   Loss 0.8527   LearningRate 0.0000   Epoch: 19   Global Step: 248090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:13:44,749-Speed 3055.70 samples/sec   Loss 0.8687   LearningRate 0.0000   Epoch: 19   Global Step: 248100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:13:48,113-Speed 3044.89 samples/sec   Loss 0.9154   LearningRate 0.0000   Epoch: 19   Global Step: 248110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:13:51,470-Speed 3051.48 samples/sec   Loss 0.8269   LearningRate 0.0000   Epoch: 19   Global Step: 248120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:13:54,898-Speed 2988.07 samples/sec   Loss 0.8703   LearningRate 0.0000   Epoch: 19   Global Step: 248130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:13:58,233-Speed 3071.54 samples/sec   Loss 0.8502   LearningRate 0.0000   Epoch: 19   Global Step: 248140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:01,629-Speed 3015.25 samples/sec   Loss 0.8659   LearningRate 0.0000   Epoch: 19   Global Step: 248150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:04,966-Speed 3070.40 samples/sec   Loss 0.8875   LearningRate 0.0000   Epoch: 19   Global Step: 248160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:08,307-Speed 3065.47 samples/sec   Loss 0.8677   LearningRate 0.0000   Epoch: 19   Global Step: 248170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:11,668-Speed 3047.91 samples/sec   Loss 0.8463   LearningRate 0.0000   Epoch: 19   Global Step: 248180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:15,029-Speed 3047.62 samples/sec   Loss 0.8845   LearningRate 0.0000   Epoch: 19   Global Step: 248190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:18,430-Speed 3012.02 samples/sec   Loss 0.8374   LearningRate 0.0000   Epoch: 19   Global Step: 248200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:21,790-Speed 3048.41 samples/sec   Loss 0.9028   LearningRate 0.0000   Epoch: 19   Global Step: 248210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:14:25,141-Speed 3057.14 samples/sec   Loss 0.8676   LearningRate 0.0000   Epoch: 19   Global Step: 248220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:28,506-Speed 3043.33 samples/sec   Loss 0.8482   LearningRate 0.0000   Epoch: 19   Global Step: 248230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:31,876-Speed 3039.49 samples/sec   Loss 0.8928   LearningRate 0.0000   Epoch: 19   Global Step: 248240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:35,205-Speed 3077.32 samples/sec   Loss 0.8320   LearningRate 0.0000   Epoch: 19   Global Step: 248250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:38,553-Speed 3059.87 samples/sec   Loss 0.8276   LearningRate 0.0000   Epoch: 19   Global Step: 248260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:41,954-Speed 3010.93 samples/sec   Loss 0.8727   LearningRate 0.0000   Epoch: 19   Global Step: 248270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:45,337-Speed 3028.29 samples/sec   Loss 0.8716   LearningRate 0.0000   Epoch: 19   Global Step: 248280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:48,747-Speed 3003.78 samples/sec   Loss 0.8395   LearningRate 0.0000   Epoch: 19   Global Step: 248290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:52,190-Speed 2975.65 samples/sec   Loss 0.8636   LearningRate 0.0000   Epoch: 19   Global Step: 248300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:55,596-Speed 3007.10 samples/sec   Loss 0.8418   LearningRate 0.0000   Epoch: 19   Global Step: 248310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:14:58,954-Speed 3050.57 samples/sec   Loss 0.9275   LearningRate 0.0000   Epoch: 19   Global Step: 248320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-28 01:15:02,276-Speed 3082.97 samples/sec   Loss 0.9029   LearningRate 0.0000   Epoch: 19   Global Step: 248330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:15:05,633-Speed 3051.48 samples/sec   Loss 0.8403   LearningRate 0.0000   Epoch: 19   Global Step: 248340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:15:09,009-Speed 3034.04 samples/sec   Loss 0.8584   LearningRate 0.0000   Epoch: 19   Global Step: 248350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-28 01:15:12,318-Speed 3095.55 samples/sec   Loss 0.8625   LearningRate 0.0000   Epoch: 19   Global Step: 248360   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:15:15,735-Speed 2997.70 samples/sec   Loss 0.8571   LearningRate 0.0000   Epoch: 19   Global Step: 248370   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:15:19,101-Speed 3042.46 samples/sec   Loss 0.8881   LearningRate 0.0000   Epoch: 19   Global Step: 248380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:15:22,519-Speed 2996.68 samples/sec   Loss 0.8693   LearningRate 0.0000   Epoch: 19   Global Step: 248390   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:15:25,989-Speed 2952.44 samples/sec   Loss 0.8958   LearningRate 0.0000   Epoch: 19   Global Step: 248400   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:15:29,752-Speed 2722.17 samples/sec   Loss 0.8271   LearningRate 0.0000   Epoch: 19   Global Step: 248410   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-28 01:15:33,093-Speed 3065.85 samples/sec   Loss 0.8611   LearningRate 0.0000   Epoch: 19   Global Step: 248420   Fp16 Grad Scale: 16384   Required: -0 hours